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ABSTRACT 


This  study  analyzes  data  from  a  select  group  of  active  duty  (AD)  service  members 
enrolled  to  the  Puget  Sound  area  Navy  military  treatment  facilities  (MTF)  in  order  to 
develop  a  model  that  identifies  the  risk  that  opioid  users  will  become  high  opioid  users, 
as  defined  by  Navy  Bureau  of  Medicine  and  Surgery  (BUMED).  The  analysis  examines 
the  relationship  between  the  response  variable — high  opioid  user — as  a  function  of  a 
number  of  explanatory  variables,  including  patient  age,  deployment  history,  sources  of 
prescription  and  medical  diagnoses.  Logistic  regression  and  machine  learning  models  are 
used  for  data  analysis. 

The  study  concludes  that  a  simple,  executable  model  that  consolidates  the 
variables  to  two  explanatory  factors  performs  as  well,  if  not  better  than,  the  more 
complicated  machine  learning  models.  The  two  highly  influential  factors  are  the  number 
of  prescription  sources  for  opioid  medications  and  the  total  number  of  diagnoses. 

This  logistic  regression  model  has  the  potential  to  benefit  Navy  Medicine  to  make 
important  decisions  for  their  opioid-prescribed  patients.  With  the  ability  to  identify  the 
risk  that  an  opioid  user  becomes  a  high  user,  healthcare  leaders  can  better  manage 
resources  to  focus  on  the  prevention  and  treatment  of  higher-risk  patients.  This 
concentrated  coordination  can  result  in  improved  patient  care  for  this  sub-population, 
reduced  long-term  cost  for  the  military  healthcare  system  and,  overall,  a  more  medically 
ready  military  force. 
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EXECUTIVE  SUMMARY 


In  an  effort  to  build  a  model  to  identify  the  risk  that  opioid  users  may  become 
high  users,  our  study  examines  explanatory  factors  that  influence  opioid  use.  The  data  is 
provided  by  the  Analytics/Enterprise  Support  Services  Department  of  BUMED  and 
focuses  on  active  duty  (AD)  service  members  enrolled  to  the  Puget  Sound  area  Navy 
military  treatment  facilities  (MTF).  The  analysis  examines  the  relationship  between  the 
response  variable,  high  opioid  user,  as  a  function  of  91  explanatory  variables,  including 
patient  age,  deployment  history,  sources  of  prescription  and  medical  diagnoses.  Basic 
logistic  regression,  elastic  net  penalized  logistic  regression,  random  forest  and  boosted 
tree  classification  models  are  used  for  our  data  analysis. 

We  plotted  cross-validated  receiver  operating  characteristic  (ROC)  curves  to 
compare  model  performance  and  to  avoid  over-fitting  for  the  random  forest  and  boosted 
tree  models.  Although  simpler,  the  basic  logistic  regression  model  performs  well  when 
compared  to  the  complex  machine  learning  models.  The  logistic  regression  model  is  also 
easier  to  reproduce.  Just  as  importantly,  the  output  is  easy  to  understand  and  interpret. 
The  log-odds  and  probability  of  a  high  user  are  a  linear  function  of  the  two  explanatory 
variables  in  the  final  logistic  regression  model  and  thus,  conceptually,  easier  to 
communicate. 

Therefore,  the  recommended  model  for  BUMED  is  a  logistic  regression  model 
with  two  explanatory  variables,  without  interactions.  These  two  variables,  the  number  of 
prescription  sources  for  opioid  medications  and  the  total  number  of  diagnoses  are 
constructed  from  the  original  data  files  from  BUMED  and  encompass  the  majority  of  the 
91  explanatory  variables. 

A  lift  curve  is  used  for  improved  interpretability  of  the  model  for  decision  makers. 
The  curve  shows  that  with  limited  resources,  if  MTFs  could  subset  the  patients,  by 
focusing  on  a  percentage  of  the  opioid  user  population  with  the  highest  estimated 
probability  of  high  opioid  use,  the  probability  of  identifying  a  high  user  can  be  improved 
by  the  amount  of  the  lift. 


xv 


This  logistic  regression  model  has  the  potential  to  benefit  Navy  Medicine  to  make 
important  decisions  for  their  opioid  prescribed  patients.  With  the  ability  to  identify  the 
risk  that  an  opioid  user  becomes  a  high  user,  health  care  leaders  can  better  plan  and 
manage  finite  resources  to  focus  on  the  prevention  and  treatment  of  the  higher  risk 
patients.  This  concentrated  coordination  of  care  can  result  in  improved  patient  care  for 
this  sub-population,  reduced  long  term  cost  for  the  military  health  care  system  and 
overall,  a  more  operationally  ready  workforce. 

This  research  is  an  initial  effort  to  explore  ways  to  identify  opioid  users  that  may 
have  greater  risk  of  becoming  a  high  opioid  user.  For  future  studies,  research  can  also 
examine  data  on  patients  that  did  not  have  opioids  prescribed  to  compare  the  risk  factors 
of  becoming  an  opioid  user. 
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I.  INTRODUCTION 


In  the  United  States,  there  is  a  growing  epidemic  that  until  recently  has  not 
received  much  media  coverage:  the  use  of  opioids  to  relieve  pain.  Opioids  are  a  type  of 
narcotic,  commonly  prescribed  for  pain.  Roughly  20%  of  patients  with  pain-related 
diagnoses  are  prescribed  an  opioid  (Chou,  Dowell,  &  Haegerich,  2016).  According  to  the 
National  Institute  on  Drug  Abuse  (NIDA),  opioids  can  be  natural,  semisynthetic  or 
synthetic.  The  drugs  provide  relief  by  reducing  the  intensity  of  pain  signals  to  the  brain 
and  this,  in  turn,  minimizes  the  effects  of  the  painful  stimulus  (NIDA,  2014).  Some 
common  medications  that  are  considered  opioids  include  hydrocodone,  oxycodone, 
morphine  and  codeine  (NIDA,  2014). 

Opioid  pain  medications  can  present  serious  risks  for  the  patient,  including 
dependency,  overdose  and  opioid-use  disorder.  Opioid  abuse  has  become  the  leading 
cause  of  preventable  deaths  in  the  United  States  (Rudd,  Aleshire,  Zibbell,  &  Gladden, 
2016).  In  2014  alone,  according  to  the  same  source,  there  were  over  47,000  deaths 
attributed  to  drug  overdose  and  61%  of  those  deaths  involved  opioid  overdoses.  That  is 
roughly  25%  more  deaths  than  from  either  firearms  or  motor  vehicle  accidents.  The 
Centers  for  Disease  Control  and  Prevention  (CDC)  has  historically  characterized  all 
opioid  pain  reliever  deaths  as  prescription  opioid  overdoses  (Rudd  et  al.,  2016). 
Additionally,  the  numbers  continue  to  dramatically  increase;  Figure  1,  taken  from  a  CDC 
(2015)  report,  shows  that  the  rate  of  opioid  overdoses  has  tripled  since  2000.  This 
increase  is  alarming  and  present  in  all  demographics,  regardless  of  sex,  age  or  race  (Rudd 
et  al.,  2016).  The  focus  of  this  study  is  a  specific  population  of  opioid  users,  active  duty 
(AD)  military  personnel. 
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Opioid  overdoses  driving  increase  in  drug  overdoses  overall 


Age-adjusted  rate  of  drug  overdose  deaths  and 
drug  overdose  deaths  involving  opioids,  United  States,  2000-2014 
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2000  2002  2004  2006  2008  2010  2012  2014 


SOURCE:  Centers  for  Disease  Control 
and  Prevention.  Increases  in  Drug  and 
Opioid  Overdose  Deaths  -  United 
States,  2000  to  2014.  MMWR  2015. 
www.cdc.gov/drugoverdose 


Figure  1.  Overdose  Death  Rates  from  2000-2014.  Source:  CDC  (2015). 


A.  BACKGROUND 

Several  important  factors  contribute  to  the  increase  in  opioid  use  and  abuse.  The 
liberalization  of  laws  governing  the  treatment  of  chronic  pain,  the  aggressive  marketing 
efforts  of  the  pharmaceutical  industry  and  the  introduction  of  a  different  pain 
management  standard  that  began  in  the  1990s  have  all  played  major  roles.  Prior  to  1990, 
U.S.  physicians  took  a  minimalist  approach  to  treating  chronic  pain  patients  (Levy, 
Netzer,  &  Pikulin,  2014). 

In  2015,  the  CDC  published  suggested  guidelines  for  prescribing  opioids  for 
chronic  pain  in  the  United  States  (Chou  et  al.,  2016).  These  guidelines  specifically  focus 
on  affecting  medical  provider’s  behavior  to  ensure  the  safest  and  most  effective  treatment 
for  their  patients.  The  guidelines  also  discuss  the  use  of  opioids  in  treating  chronic  pain. 
The  guidelines  do  not  target  treatment  of  patients  with  cancer,  palliative  care  or  end-of- 
life  type  care.  Instead,  they  are  intended  for  primary  care  providers,  who  treat  chronic- 
pain  patients  in  outpatient  settings,  as  they  account  for  almost  half  of  all  opioid 
prescriptions.  Chou  et  al.  (2016)  noted  that  the  recommendations  are  based  on  three  key 
principles.  The  first  is  that  non-opioid  therapy  is  the  preferred  method  for  chronic  pain 
treatment.  The  second  is  that  the  lowest  possible  opioid  dosage  should  be  selected  to 
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reduce  risk  of  overdose.  Thirdly,  providers  should  always  exercise  caution  when 
prescribing  opioids  while  closely  monitoring  their  patients. 

Opioid  abuse  is  not  just  a  problem  for  the  civilian  population.  It  is  a  problem  for 
our  nation’s  military  personnel  and  veterans  as  well.  According  to  a  Veterans  Affairs 
(VA)  study,  veterans  are  twice  as  likely  to  die  from  accidental  opioid  overdose  than  non¬ 
veterans  (Childress,  2016).  Additionally,  Childress  (2016)  noted,  that  more  than  half  of 
veterans  suffer  from  chronic  pain,  compared  to  only  about  30%  for  the  general 
population,  where  chronic  pain  is  defined  “as  pain  that  lasts  longer  than  three  months  or 
past  the  time  of  normal  tissue  healing.”  Until  a  few  years  ago,  veterans  with  chronic  pain 
were  treated  exclusively  with  opioids.  The  prevention,  assessment  and  treatment  of 
chronic  pain  remain  a  tremendous  challenge  for  health  care  providers  (Childress,  2016). 
The  Navy  AD  population  is  on  average  much  younger  than  the  general  population. 
Nevertheless,  in  a  recent  Center  for  Naval  Analyses  (CNA)  study  of  four  large  Navy 
military  treatment  facilities  (MTF),  roughly  25%  to  32%  of  all  AD  beneficiaries  received 
at  least  one  opioid  prescription  during  fiscal  year  (FY)  2013  (Levy  et  al.,  2014). 

The  United  States  Navy  Bureau  of  Medicine  and  Surgery  (BUMED)  stresses  in  its 
vision  statement  “our  health  care  is  patient-centered  and  provides  best  value,  preserves 
health,  and  maintains  readiness”  (Goff  &  Sayers,  2015).  Thus,  two  of  BUMED’s  three 
strategic  principles  are  value  and  readiness.  More  specifically,  under  the  value  principle, 
the  goal  is  to  decrease  enrollee  network  cost  by  optimizing  resource  utilization  and 
managing  referrals  in  order  to  provide  the  best  care  at  the  best  value.  Under  the  readiness 
principle,  the  goal  is  to  deliver  ready  capabilities  to  the  operational  commander  by 
aligning  Navy  Medicine’s  “manning,  training,  and  equipping  to  maintain  a  medically 
ready  force  (Goff  &  Sayers,  2015).” 

According  to  Levy  et  al.  (2014),  around  80%  of  health  care  resources  for  Navy 

beneficiaries  are  devoted  to  patients  with  chronic  pain.  Since  chronic  pain  patients  are 

typically  prescribed  opioids,  this  group  drives  a  disproportionate  amount  of  the 

populations  cost  to  the  health  care  system.  Additionally,  from  an  operational  and 

readiness  standpoint,  the  Navy  may  not  be  able  to  deploy  patients  who  have  been 

prescribed  opioids  for  chronic  pain  or  those  that  have  many  of  the  associated  comorbid 
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conditions.  Thus,  the  need  to  identify  and  manage  the  high  opioid  user  population  is  one 
of  strategic  importance  that  aligns  with  BUMED’s  strategic  principles  of  value  and 
readiness. 

B.  PURPOSE  OF  THE  STUDY 

High  opioid  use  is  defined  by  the  Navy  and  Marine  Corps  Public  Health  Center 
(NMCPHC)  as  having  five  or  more  prescriptions  dispensed  for  select  pain  medications 
within  90  days  (Broad,  2016).  Identifying  a  potential  high  opioid  user  early  will  allow 
health  care  professionals  and  leaders  to  more  closely  monitor  this  group  of  beneficiaries 
to  ensure  they  receive  comprehensive  care  while  mitigating  the  cost  and  operational 
impact  on  the  patient’s  parent  organization. 

This  study  examines  over  eighty  demographic  and  patient  medical  variables  for 
opioid  users  in  an  AD  military  population  and  builds  a  simple  logistic  regression  model 
to  estimate  the  probability  an  opioid  user  from  this  population  is  a  high  opioid  user. 
While  this  model  is  not  good  for  classifying  a  particular  individual  as  a  high  opioid  user, 
we  show  that  it  can  be  used  to  identify  the  increase  in  the  concentration  of  high  users  in  a 
smaller  sub-population.  We  also  show  that  the  simple  model  performs  well  or  better  than 
more  complex  machine  learning  models  (penalized  logistic  regression,  random  forests, 
boosted  trees)  fit  with  the  same  data.  The  results  of  this  study  can  be  used  by  BUMED  to 
help  achieve  its  strategic  goals  in  the  areas  of  readiness  and  value. 

C.  ASSESSING  AND  DEFINING  HIGH  OPIOID  USERS  AMONG  AD  NAVY 

POPUUATION 

Across  different  health  care  systems,  multiple  methodologies  and  definitions  are 
used  for  patients  treated  for  chronic  pain.  In  2015,  BUMED  established  a  comprehensive 
case  definition  to  assess  and  identify  opioid-prescribed  patients  enrolled  in  Navy  MTFs. 
The  adopted  definition  for  a  high  opioid  user  is  the  same  as  NMCPHC’ s  definition  of  five 
or  more  dispensing  events  of  a  medication  likely  to  be  associated  with  pain  during  the 
course  of  a  90-day  period  (Ellis,  2015).  The  types  of  medication  that  would  fall  within 
this  category  are  listed  below  to  include  certain  therapeutic  classes  and  selected 
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nonsteroidal  anti-inflammatory  drugs  (NSAID)  likely  to  be  associated  with  pain 
(Ellis,  2015). 


•  Opiate  Agonists 

•  Opiate  Partial  Agonists 

•  Skeletal  Muscle  Relaxants 

•  Centrally  Acting  Skeletal  Muscle  Relaxants 

•  Direct- Acting  Skeletal  Muscle  Relaxants 

•  Gaba-derivative  Skeletal  Muscle  Relaxants 

•  Skeletal  Muscle  Relaxants,  Misc. 

•  Selected  NSAIDS  (Aspir,  Celecoxib,  Ketoro,  Cambia,  Rub,  Sulindac) 

Based  on  those  medications,  BUMED  extracted  administrative  medical  data  and 
the  Pharmacy  Detail  Transaction  Service  (PDTS)  data  from  the  Military  Health  System 
Management  Analysis  and  Reporting  Tool  (M2).  The  data  only  includes  AD  service 
members  with  at  least  one  opioid  prescription  in  FY2014  or  FY2015  who  were  enrolled 
to  the  Puget  Sound  area  Naval  MTF’s.  These  are  the  five  facilities: 

•  Naval  Hospital  Bremerton 

•  Naval  Hospital  Oak  Harbor 

•  Naval  Branch  Health  Clinic  Everett 

•  Naval  Branch  Health  Clinic  Bangor 

•  Naval  Branch  Health  Clinic  Puget  Sound 

Patients  diagnosed  with  any  of  the  cancer-related  codes  were  excluded  from  the 
high  opioid  user  criteria  and  removed  from  the  PDTS  data.  Appendix  A  lists  the  codes 
associated  with  cancer  diagnosis.  This  group  of  patients  are  already  closely  monitored 
and  specifically  prescribed  opioids  for  their  cancer-induced  pain.  Based  on  the  pseudo¬ 
identification  (ID)  code  representing  each  member  in  the  PDTS  data,  BUMED  provided  a 
risk  data  file  that  contained  additional  information  about  each  particular  patient.  The 
details  of  each  field  will  be  discussed  in  Chapter  III.  Due  to  the  Health  Insurance 
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Portability  and  Accountability  Act  of  1996,  that  specifically  deals  with  protected  health 
information  as  well  as  personally  identifiable  information,  much  of  the  demographic 
information  was  removed  from  the  data  files.  There  are  a  few  limitations  with  this  study. 

•  Reporting  errors  due  to  improper  or  insufficient  medical  coding  as  well  as 
data  entry  errors  at  the  clinic  may  exist  in  M2  data.  Furthermore,  care 
delivered  in  the  operational  setting  may  not  be  documented  in  this  system. 

•  The  PDTS  table  includes  data  for  all  prescriptions  dispensed  by  an  MTF, 
civilian  pharmacy,  or  mail  order.  It  cannot  be  determined  if  the  patient 
was  compliant  with  taking  the  medication  as  instructed. 

•  Patients  with  cancer  diagnoses  that  did  not  occur  at  the  same  time  as  their 
pain  diagnosis  could  be  included  in  this  analysis. 

•  Potential  high  opioid  users  that  have  changed  enrollment  sites  during  the 
FY2014  or  FY2015  time  period  may  not  be  detected  as  a  high  user. 

•  Since  the  reporting  period  covers  24  months,  patients  that  receive  opioid 
prescriptions  outside  of  this  period  will  not  be  accounted  for. 

D.  THESIS  ORGANIZATION 

Chapter  II  provides  background  information  on  high  opioid  users,  chronic  opioid 
users  (COUs),  and  the  connection  between  pain  and  opioids.  Chapter  III  provides 
descriptive  statistics  of  variables  used  in  the  study  and  gives  the  details  of  data 
preparation.  Chapter  IV  explores  the  methodologies  used,  a  description  and  assessment 
of  the  models  and  the  results  of  the  analysis.  The  final  chapter  presents  a  summary  of  the 
study  and  offers  recommendations  for  further  analysis. 
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II.  RELATED  LITERATURE 


This  chapter  examines  previous  studies  on  patients  identified  as  high  and  COUs. 
Additionally,  we  explore  the  relationships  between  a  very  specific  patient  group,  the 
military  population  and  opioid  use.  By  examining  the  common  factors  amongst  this 
patient  group,  we  hope  to  gain  a  better  insight  on  possible  predictors  for  AD  sailors  who 
may  become  high  opioid  users. 

A.  HIGH  OPIOID  USERS 

According  to  the  Institute  of  Medicine,  pain  is  recognized  as  a  significant  public 
health  problem  in  America  with  over  100  million  people  experiencing  chronic  pain  (Levy 
et  al.,  2014).  The  treatment  of  chronic  pain  is  especially  challenging  for  health  care 
professionals.  Due  to  its  complex  condition,  chronic  pain  can  be  defined  in  different 
ways.  According  to  the  same  source,  chronic  pain  is  defined  as  lasting  for  “greater  than 
three  months  or  past  the  time  of  normal  tissue  healing”  and  can  result  from  previous 
medical  conditions,  injuries  or  unknown  causes.  An  analysis  in  2012  by  the  National 
Health  Interview  Study  showed  that  11.2%  of  adults  reported  having  daily  pain  (Chou  et 
al.,  2016).  In  fact,  Chou  et  al.  noted  that  approximately  one  in  three  Americans  will  have 
chronic  pain  in  their  lifetime  and  over  80%  of  the  chronic  issues  are  on  the  neck  or  lower 
back.  This  source  also  reported  that  the  majority  of  patients  who  experience  chronic  pain 
are  also  diagnosed  with  depression.  The  belief  is  that  the  ongoing  pain  and  disability 
leads  to  frustration  and  eventually  takes  a  psychological  toll  (Chou  et  al.,  2016). 
Additionally,  chronic  pain  in  some  people  resulted  from  a  traumatic  event  that  may  also 
trigger  post-traumatic  stress  disorder  (PTSD)  (“PTSD,”  2015).  The  same  source 
approximates  that  15%  to  35%  of  patients  with  chronic  pain  also  have  PTSD.  Only  2% 
of  patients  diagnosed  with  PTSD  do  not  have  chronic  pain.  Thus,  PTSD  and  chronic  pain 
have  a  very  clear  connection  (“PTSD,”  2015). 

Opioids  are  commonly  prescribed  for  non-cancer  pain  symptoms.  There  is 
always  the  risk  of  dependency,  abuse  and  opioid  use  disorder,  which  is  defined  as  a 
“pattern  of  opioid-use  leading  to  clinically  significant  impairment”  (Chou  et  al.,  2016). 
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Thus,  it  is  very  important  to  identify  and  monitor  patients  who  are  considered  high 
opioid  users. 

The  Navy  and  Marine  Corps  Public  Health  Center  conducted  an  M2  data  pull  in 
2016  for  each  Navy  MTF  and  discovered  that  roughly  4%  of  all  AD  Navy  beneficiaries 
could  be  classified  as  high  users  (Broad,  2016).  The  percentage  may  be  a  little  higher 
overall,  as  about  2,200  people  fitting  the  description  of  high  users  were  excluded  because 
the  last  enrollment  record  did  not  classify  them  as  AD  or  Navy  enrollees.  Table  1  lists 
each  Navy  MTF  and  its  percentage  of  high  opioid  users. 


Table  1 .  High  Users  of  Chronic  Pain  Medication  among  AD  Navy 

Enrollees.  Source:  Broad  (2016). 


Metric  1A 

Enrollees  Pain  (ALL) 


N 

N 

% 

Total  Navy  Enrollees 

432,417 

16,115 

3.7 

EAST  Region 

244,740 

9.804 

4.0 

WEST  Region 

187,677 

6,311 

3.4 

Region 

E 

Parent  Enrollment  Site 

JAM  ES  A  LOVELL  FHCC 

4.125 

188 

4.6 

E 

NHC  CHARLESTON 

8,557 

159 

1.9 

E 

NHC  NEW  ENGLAND 

12,473 

349 

2.8 

E 

NH  BEAUFORT 

5,913 

339 

57 

E 

NH  CAMP  LEJEUNE 

35.541 

1,987 

5.6 

E 

NH  GUANTANAMO  BAY 

2.656 

95 

3.6 

E 

NH  JACKSONVILLE 

23.062 

923 

4.0 

E 

NH  NAPLES 

2.253 

95 

4.2 

E 

NH  PENSACOLA 

28,682 

933 

3.3 

E 

NH  ROTA 

2.155 

84 

3.9 

E 

NH  SIGONELLA 

6.712 

170 

25 

E 

NHC  ANNAPOLIS 

8.528 

175 

2.1 

E 

NHC  CHERRY  POINT 

8.997 

568 

6.3 

E 

NHC  CORPUS  CHRIST  1 

4.784 

224 

4.7 

E 

NHC  PATUXENT  RIVER 

5,630 

208 

3.7 

E 

NHC  QUANT  ICO 

11.318 

402 

3.6 

E 

NMC  PORTSMOUTH 

73.354 

2,905 

4.0 

W 

NH  BREMERTON 

14.036 

362 

26 

W 

NH  CAMP  PENDLETON 

30.764 

1.421 

46 

w 

NH  GUAM -AG  AN  A 

3,543 

104 

2.9 

w 

NH  LEMOORE 

7.807 

295 

3.8 

w 

NH  OAK  HARBOR 

7.642 

231 

3.0 

w 

NH  OKINAWA 

16.776 

539 

3.2 

w 

NH  TWENTYNINE  PALMS 

9,622 

477 

5.0 

w 

NH  YOKOSUKA 

18.589 

382 

2.1 

w 

NHC  HAWAII 

23,736 

796 

3.4 

w 

NMC  SAN  DIEGO 

55.162 

1.704 

3.1 

Navy  and  Marine  Carps  Pifcfcc  Heart!  Center  Heart!  Analysis  Departnent 

Soiree  MHS  Man  (M2)  DEERS  and  CAPER  tabes  and  MEDBOLTS  system  fee  LJMDU.  FEB  2016 
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B.  CHRONIC  OPIOID  USERS 


While  there  is  evidence  to  support  the  short-term  effectiveness  of  opioids  in  the 
reduction  of  pain,  the  evidence  is  not  as  clear  for  long-term  use.  Very  few  studies  have 
examined  the  effectiveness  of  opioids  with  outcomes  beyond  12  months.  Yet, 
researchers  estimated  in  2005,  that  3%  to  4%  of  the  U.S.  adult  population  was  prescribed 
long-term  opioid  therapy  (Levy  et  al.,  2014).  Levy  also  suggested  that  patients  that  have 
a  history  of  opioid  prescriptions  have  a  greater  risk  for  overdose  and  opioid  use  disorder. 
Thus,  COUs,  generally  defined  as  patients  who  have  been  prescribed  a  90-day  or  greater 
supply  of  opioids,  are  of  particular  interest  to  BUMED  and  health  care  professionals 
(Levy  et  al.,  2014). 

In  the  CNA  study  of  chronic  opioid-use  and  lower-back  pain  among  Navy 
beneficiaries  at  the  four  large  Navy  MTFs,  they  evaluated  opioid-use  in  terms  of  episodes 
of  use,  days  of  supply  and  dosage  (Levy  et  al.,  2014).  Some  of  the  important  factors 
quantified  included  the  following: 

•  Was  the  patient  also  on  anti-depressant? 

•  Was  an  NS  AID  attempted  to  relieve  the  pain,  before  the  onset  of  opioid 
therapy? 

•  Did  the  patient  receive  drugs  from  other  pharmacy  sources  in  the  civilian 
sector? 

Table  2  shows  the  percentage  of  AD  opioid  users  and  the  percentage  of  AD  COUs 
for  each  of  the  four  large  Navy  MTFs.  A  key  point  to  note  that  is  not  depicted  in  the 
table  is  that,  while  the  COU  percentage  amongst  the  AD  population  is  low,  ranging 
between  1.5%  to  3%,  the  COUs  among  the  retiree  demographic  ranged  from  7.2%  to 
13.6%  of  total  opioid  users  for  each  facility.  This  meant  that  age  and  military 
experiences  are  possibly  highly  influential  factors. 
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Table  2.  Percentage  of  Opioid  Users  and  COUs  for  Each  Facility. 

Source:  Levy  et  al.  (2014). 


AD 

AD 

Population 

60,060  | 

Population 

56,043 

Opioid  users 

Opioid  users 

Number 

1 7,326 

Number 

15,496 

%  of  population 

28.9 

%  of  population 

27.7 

COUs 

COUs 

Number 

517 

Number 

283 

%  of  opioid  users 

3.0 

%  of  opioid  users 

1.8 

NH  Camp  Lejeune 

NH  Camp  Pendleton 

AD 

AD 

Population 

73,012  | 

Population 

87,752 

Opioid  users 

Opioid  users 

Number 

23,289 

Number 

22,285 

%  of  population 

31.9 

%  of  population 

25.4 

COUs 

COUs 

Number 

357 

Number 

365 

%  of  opioid  users 

1.5 

%  of  opioid  users 

1.6 

NMC  San  Diego 

NMC  Portsmouth 

CNA  provided  the  following  findings  and  recommendations  relevant  to  this  study 
(Levy  et  al.,  2014). 

•  AD  personnel  are  less  likely  to  become  chronic  users  compared  to 
dependents  and  retiree  patients. 

•  Users  of  anti-depressants  are  much  more  likely  to  be  chronic  users. 

•  Those  prescribed  a  NSAID  such  as  ibuprofen  or  aspirin  initially,  before 
being  prescribed  opioid  therapy,  are  less  likely  to  be  chronic  users. 

•  A  higher  percentage  of  COUs  are  chronic  lower  back  pain  patients  versus 
patients  with  acute  lower  back  pain. 


•  Patients  who  receive  prescriptions  entirely  in  the  direct  care  system  or 
entirely  in  purchased  care  are  less  likely  to  be  COUs  than  those  who 
receive  prescriptions  in  both  systems. 
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C.  PTSD,  PAIN  AND  OPIOIDS 

Based  on  recent  research,  there  is  a  clear  connection  between  chronic  pain  and 
PTSD.  The  VA  reported  that  51%  of  patients  with  chronic  lower  back  pain  also  had 
PTSD  symptoms  (“PTSD,”  2015).  In  another  study,  over  50%  of  Iraq  and  Afghanistan 
veterans  diagnosed  with  PTSD  also  received  one  or  more  chronic  pain  diagnoses 
(Seal,  2014).  Seal’s  research  suggested  that  there  is  evidence  that  chronic  pain  is  more 
prevalent  in  female  veterans  who  recently  returned  from  combat.  Figure  2  compares  the 
returning  veterans  from  Iraq  and  Afghanistan  that  have  pain  diagnoses  and  examines 
whether  they  have  no  mental  health  diagnosis,  with  a  mental  health  diagnosis  (excluding 
PTSD)  or  have  a  PTSD  diagnosis.  The  red  bars  are  larger  depicting  the  prevalence  of 
chronic  pain  in  those  diagnosed  with  PTSD. 


Pain  and  Mental  Health  Comorbidity  in 
445#7*>7  Iraq  and  Afghanistan  Veterans 
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Headaches/Migraine  Back  Pain  Neck  Pain  Arthritis/Joint  Pain 


Seal,  2013 


Figure  2.  PTSD  and  Chronic  Pain.  Source:  Seal  (2014). 
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There  are  several  medical  hypotheses  for  the  link  between  PTSD  and  pain.  Seal 
presents  a  very  compelling  theory  of  mutual  maintenance.  Because  PTSD  creates  an 
anxiety  state,  the  person’s  pain  perception  is  increased.  As  the  perception  of  pain  is 
exacerbated,  into  possibly  chronic  pain,  this  leads  to  increased  disability.  This,  in  turn, 
drives  the  person  to  perceive  their  pain  to  be  even  worse,  which  feeds  back  into  the 
symptoms  of  PTSD.  Figure  3  illustrates  this  cycle. 

Chronic  pain  and  PTSD: 

Mutual  Maintenance 


1.  Anxiety/hyperarousal^  pain  perception 

2.  PTSD  re-experiencing  evokes  pain 

3.  Avoidance/Reduced  activity-^ 
disability 


Figure  3.  The  Mutual  Maintenance  Cycle.  Source:  Seal  (2014). 


Thus,  a  logical  follow-on  is  to  examine  the  link  between  PTSD  and  opioid  use. 
Seal’s  presentation  (2014)  references  research  that  shows  patients  with  both  pain  and 
PTSD  are  more  likely  to  be  prescribed  opioids  than  patients  with  pain  but  no  PTSD 
diagnosis.  In  her  study  of  Iraq  and  Afghanistan  veterans,  she  found  that  those  with  PTSD 
are  over  two  and  half  times  more  likely  to  be  prescribed  opioids  than  those  patients  with 
no  mental  health  diagnosis.  Additionally,  Seal  concluded  that  of  the  PTSD  diagnosed 
veterans,  those  with  severe  PTSD  symptoms  are  more  likely  to  receive  prescription 
opioids  for  their  pain.  This  conclusion  places  this  group  of  patients  into  an  even  greater 
risk  of  adverse  effects  from  opioid  use. 
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III.  DATA 


This  chapter  describes  the  data  set  analyzed  in  the  study,  the  data  preparation 
process  and  the  description  of  the  response  and  explanatory  variables  with  some  initial 
exploratory  analysis.  The  variables  described  in  this  chapter  are  used  in  Chapter  IV  to 
construct  models  to  estimate  the  probability  that  an  individual  in  the  AD  military  sub¬ 
population  who  has  been  prescribed  an  opioid  at  least  once,  is  a  high  opioid  user. 

The  response  variable,  constructed  from  the  data  provided  and  described  in  this 
chapter  is  a  binary  variable  indicating  whether  an  individual  is  a  high  user  of  opioids  or 
not  a  high  user  of  opioids,  but  all  individuals  in  the  data  studied  have  at  least  one 
prescription  for  opioids. 

There  are  91  explanatory  variables  available  directly  from  the  data  provided  by 
BUMED.  They  can  be  categorized  into  three  types: 

•  Eighty  binary  medical  risk  diagnoses  (given  in  Appendix  B)  variables 
indicating  whether  the  individual  has  or  has  not  been  diagnosed  with  the 
condition. 

•  Five  binary  variables  assigning  prescription  source  as  direct  care,  managed 
care  support  contractor,  theater  medical  data  store,  TRICARE  mail-order 
pharmacy  and  VA  clinical/health  data  repository. 

•  Six  variables  pertaining  to  the  patient’s  history  that  may  have  an  influence 
on  the  patient’s  opioid-use. 

The  response  and  explanatory  variables  are  discussed  in  greater  detail  in  Section 
C  of  this  chapter.  Additionally,  a  brief  description  and  statistics  on  some  of  the 
explanatory  variables  is  included  in  Section  D. 

A.  DATA  SOURCE/CLEANING 

The  data  used  for  this  study  was  obtained  from  BUMED  Analytics/Enterprise 
Support  Services  Department.  The  data  contains  information  on  AD  patients  assigned  to 
Puget  Sound  area  Naval  MTF’s  that  received  at  least  one  opioid  prescription  in  either 
FY2014  or  FY2015.  The  data  was  received  in  the  form  of  two  files,  one  for  each  fiscal 
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year.  Each  file  contains  two  spreadsheets,  one  with  PDTS  records  and  one  containing  the 
patient  risk  file. 

The  PDTS  spreadsheets  have  7066  rows  and  8066  rows  for  FY2014  and  FY2015 
respectively.  Each  row  corresponds  to  a  single  opioid  prescription  issued  in  that  year. 
The  PDTS  spreadsheets  have  three  relevant  fields: 

•  Pseudo  ID 

•  Opioid  issue  date 

•  Source  of  opioid 

Each  patient  is  identified  by  their  pseudo  ID,  assigned  by  BUMED  to  ensure 
patient  confidentiality.  The  response  variable  and  the  five  binary  variables  assigning  a 
prescription  source  are  constructed  from  this  file. 

The  risk  spreadsheets  have  2889  rows  for  FY2014  and  3742  rows  for  FY2015. 
Each  row  corresponds  to  a  patient  who  had  at  least  one  opioid  prescription  issued  in  that 
year.  There  were  57  duplicate  pseudo  IDs  for  the  FY2014  risk  file  and  48  duplicate 
pseudo  IDs  for  the  FY2015  risk  file.  The  files  contain  86  fields  for  the  explanatory 
variables  (including  the  80  medical  diagnosis  fields  plus  six  others)  and  a  field  containing 
the  ID. 

Each  of  the  four  spreadsheets  are  exported  to  comma  separated  value  files  and 
imported  into  the  R  programming  environment  for  further  manipulation  (R  Core 
Team,  2016).  The  two  PDTS  files  are  combined  before  constructing  the  high  user 
response  variable  based  on  BUMED ’s  definition.  Specifically,  we  define  a  high  user  to 
be  any  patient  who  is  prescribed  five  or  more  opioids  within  90  days  based  on  the 
combined  two-years  of  PDTS  records.  From  the  combined  PDTS  files,  we  construct  a 
single  PDTS  user  output  file  with  one  row  per  unique  pseudo  ID  and  columns  for  pseudo 
ID,  the  minimum  number  of  days  between  any  sequence  of  five  prescriptions  for 
individuals  with  at  least  five  prescriptions,  and  a  column  denoting  whether  the  patient 
was  a  high  user.  Figure  4  shows  a  histogram  of  the  minimum  number  of  days  between  a 
sequence  of  five  opioid  prescriptions,  for  each  patient  who  has  five  or  more  prescriptions. 
The  figure  shows  that  only  649  patients  have  five  or  more  prescriptions  and  only  235 
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patients,  highlighted  in  red,  have  five  or  more  prescriptions  within  a  90  day  period 
accounting  for  11.4%  and  4.1%  respectively,  of  the  total  patients  in  the  data  set. 


Opioid  Users  with  Five  or  More  Prescriptions 


0  90  180  270  360  450  540  630  720 

Number  of  Days  Between  Five  Prescriptions 

Figure  4.  A  Histogram  of  the  Minimum  Number  of  Days  between  a  Sequence  of 
Five  Opioid  Prescriptions  among  Opioid  Users  with  at  Least  Five 
Prescriptions  in  FY2014  and  FY2015 

The  two  risk  files  are  combined  and  then  merged  with  the  PDTS  user  output  file 
based  on  pseudo  IDs.  An  additional  column  is  added  to  annotate  the  fiscal  year  (FY2014 
or  FY2015)  of  the  source  risk  file.  The  newly  combined  output  file  contains  6,631 
entries  with  some  pseudo  IDs  appearing  multiple  times.  This  file  is  then  separated  by 
fiscal  year.  Duplicate  ID’s  within  each  fiscal  year  are  merged,  with  the  patient  assuming 
the  larger  value  for  each  explanatory  variable.  For  example,  if  the  pseudo  ID  appeared 
three  times  in  FY2014,  with  one  entry  having  a  risk  score  of  one  and  another  entry  with  a 
risk  score  of  two  and  the  third  with  a  risk  score  of  three,  the  updated  file  would  contain 
the  pseudo  ID  once,  with  a  risk  score  value  of  three.  The  higher  value  is  adopted  to 
assume  a  worse  case  patient  characteristic.  This  decreases  the  size  of  the  FY2014  and 
FY2015  files  to  2,831  and  3,692  patients  respectively. 
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A  summary  of  the  number  of  high  users  and  the  total  number  of  records  for  each 
fiscal  year  is  provided  in  Table  3.  There  is  a  larger  proportion  of  high  users  in  FY2014, 
0.063,  than  in  FY2015,  0.041.  Treating  the  two  years’  worth  of  Puget  Sound  data  as 
samples  from  hypothetical  FY2014  and  FY2015  populations  the  large-sample  test  of  the 
null  hypothesis  that  the  two  years’  proportions  are  equal,  is  rejected  with  a  p-value  of 
0.0006.  We  do  not  know  why  the  proportions  are  different.  There  may  have  been  a 
change  in  how  opioids  are  prescribed  at  the  Puget  Sound  MTFs  however  there  is  no 
evidence  of  policy  changes  that  may  have  affected  these  numbers. 

Table  3.  Number  of  High  Users  for  Each  Fiscal  Year 


Risk  File  Source 

Hi  User 

Non-Hi  User 

Total 

FY2014 

177 

2654 

2831 

FY2015 

153 

3539 

3692 

The  two  files,  one  for  each  fiscal  year  are  re-combined  into  a  single  file.  To 
ensure  that  this  final  data  set  only  has  unique  pseudo  IDs,  the  same  merging  process  is 
used.  For  the  843  patients  with  records  in  the  FY2014  and  FY2015  risk  files,  the  larger 
value  of  each  explanatory  variable  is  used,  resulting  in  a  data  set  with  5680  total  patients. 

B.  TRAINING  AND  TEST  SETS 

We  randomly  split  our  data  into  training  and  test  sets,  with  75%  for  the  training 
set  and  25%  for  the  test  set.  The  training  set  is  used  (in  Chapter  IV)  to  fit  a  number  of 
different  types  of  models,  from  which  we  will  choose  the  “best.”  The  test  set  is  set  aside 
until  after  the  model  fitting  is  complete  and  used  to  obtain  unbiased  estimates  of 
measures  of  model  performance.  Selection  of  the  training  and  test  sets  is  stratified  by 
non-high  and  high  users  so  that  the  same  ratio  of  high  users  is  found  in  both  the  training 
and  the  test  set.  Table  4  summarizes  the  number  of  opioid  users  in  the  training  and 
test  sets. 
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Table  4.  Number  of  High  and  Non-High  Users  in  the  Training  and  Test  Sets 


Training  Set 

Test  Set 

Total 

High  User 

176 

59 

235 

Non-High  User 

4083 

1362 

5445 

Total 

4259 

1421 

5680 

C.  RESPONSE  AND  EXPLANATORY  VARIABLES 

The  response  variable  used  in  the  analysis  for  the  models  is  binary:  1  indicating  a 
high  user  of  opioids  and  0  if  not.  The  criteria  for  determining  whether  a  patient  is  a  high 
opioid  user  follows  BUMED’s  adopted  definition  of  five  or  more  dispensing  events  of  a 
medication  likely  to  be  associated  with  pain  during  a  90-day  period.  This  indicator 
response  variable  is  generated  from  the  PDTS  and  merged  with  the  risk  file  according  to 
the  pseudo  IDs. 

There  are  88  variables  eventually  used  in  the  analysis  to  fit  the  models.  The 
presence  of  medical  risk  conditions  make  up  77  of  the  variables.  Appendix  B  lists  these 
medical  risk  conditions.  These  conditions  are  selected  directly  from  the  M2  health  risk 
conditions  category  file.  The  risk  conditions  in  M2  are  based  on  the  Wakely  Risk 
Assessment  (WRA)  model  that  maps  over  17,000  International  Classification  of  Diseases 
(ICD)  volume  9  diagnosis  codes  to  90  condition  categories  (Mehmud,  2012).  Appendix 
C  lists  the  WRA  condition  categories.  BUMED  selected  66  of  the  90  condition 
categories  that  may  be  relevant  to  this  study.  These  are  shown  in  Appendix  C.  The  M2 
medical  conditions  file  contains  13  sub-categories  not  included  in  the  WRA,  that  better 
reflect  the  military  population’s  common  illnesses.  These  are  annotated  in  Appendix  B 
and  mostly  pertain  to  mental  health  conditions  like  PTSD,  neurotic  disorders  and 
disturbance  of  conduct.  Additionally,  the  following  medical  conditions  were  eliminated 
as  possible  factors  to  simplify  our  model  because  no  patient  in  the  data  set  possessed 
these  diagnoses: 

•  Cystic  fibrosis 

•  Disease  of  the  blood  (high) 

•  Neoplasm  cancer  (very  high) 
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The  eleven  variables  describe  patient  characteristics  as  well  as  the  five  possible 
sources  of  a  prescription.  The  additional  six  patient  characteristics  include  a  categorical 
variable  for  age,  presence  of  acute  reaction  to  stress,  a  risk  score,  number  of  days  since 
most  recent  Overseas  Contingency  Operation  (OCO)  deployment,  a  binary  variable 
indicating  if  the  individual  was  ever-deployed  and  a  case  management  (CM)  acuity  level. 
Based  on  related  studies,  there  are  indications  that  some  of  these  characteristics  may 
affect  a  patient’s  opioid  usage  (Seal,  2014). 

D.  DESCRIPTIVE  STATISTICS 

A  brief  description  and  basic  statistics  for  the  explanatory  variables  and  their 
relationship  to  the  response  variable  is  in  this  section.  These  include  the  patient 
characteristic  variables,  the  five  prescription  source  variables  and  a  handful  of  medical 
risk  diagnoses  variables.  Because  the  exploratory  analysis  is  part  of  the  model  fitting 
processes,  the  analysis  in  this  section  is  based  only  on  the  4259  entries  of  the  training  set. 

(1)  Age  Group  Category 

Rather  than  give  an  age  in  years,  the  exploratory  variable  “Age  Group  Category,” 
taken  directly  from  the  M2  risk  file,  assigns  a  letter  code,  D-G,  to  patients  whose  ages 
are  18-24,  25-34,  35-44,  45-64,  respectively.  Table  5  shows  the  number  of  high  users 
for  each  age  group.  Although  Category  E  has  the  greatest  number  of  opioid  users, 
patients  in  Category  G  have  the  largest  proportion  of  high  users,  thus  the  proportion  of 
high  users  is  increasing  with  age  group.  Additionally,  the  data  set  contains  5 1  entries  that 
did  not  have  an  assigned  code. 


Table  5.  Percentage  of  High  Users  by  Age  Group  Categories 


Ages  (Code  Category) 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

18-24  (D) 

31(2.7%) 

1138  (97.3%) 

1169 

25-34  (E) 

65(3.7%) 

1699  (96.3%) 

1764 

35-44  (F) 

56  (5.4%) 

982  (94.6% 

1038 

45-64  (G) 

19  (8.0%) 

218  (92.0%) 

237 

No  Assigned  Code 

5  (9.8%) 

46  (90.2%) 

51 

Total 

176(4.1%) 

4083  (95.9%) 

4259 
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(2)  Acute  Reaction  to  Stress 

The  factor  acute  reaction  to  stress  takes  two  values,  “yes”  and  “no.”  This  factor  is 
defined  as  “a  psychological  condition  arising  in  response  to  a  terrifying  or  traumatic 
event”  (Kenny,  2013).  These  events  can  range  from  sexual  assaults  to  extreme 
experiences  from  war  conflicts.  As  a  result,  military  personnel  can  be  at  greater  risk. 
Only  nine  entries  were  assigned  this  diagnosis  as  shown  in  Table  6.  Therefore,  this  factor 
will  not  likely  influence  our  model. 


Table  6.  Percentage  of  High  Users  by  Acute  Reaction  to  Stress 


Acute  Reaction 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

Yes 

1  (11.1%) 

8  (88.9%) 

9 

No 

175  (4.1%) 

4075  (95.9%) 

4250 

Total 

176(4.1%) 

4083  (95.9%) 

4259 

(3)  Risk  Score 

The  risk  score  describes  the  person's  expected  relative  cost  in  medical  resources 
based  on  the  diagnoses  and  drugs  accumulated  within  the  reporting  period  (DHA,  2016). 
The  lower  the  score,  the  less  risk  for  the  patient.  A  score  of  one  means  the  individual  is 
at  normal  risk.  This  risk  score  is  not  truncated,  so  there  is  no  upper  bound.  In  the 
training  set,  the  score  ranged  from  zero  to  forty  seven.  Because  only  186  patients  have  a 
risk  score  of  five  or  greater,  we  assign  these  patients  to  a  single  category.  Table  7  lists 
the  number  and  percentage  of  high  users  by  risk  score.  Figure  5  shows  that  as  the  risk 
increases,  the  proportion  of  high  users  of  opioids  also  increases.  The  red  lines  indicate 
the  standard  error  bars  for  the  proportion  of  high  users  in  each  risk  score  category. 
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Table  7.  Percentage  of  High  Users  by  Risk  Score  Category 


Risk  Score 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

0 

4  (0.4%) 

1097(99.6%) 

1101 

1 

56  (2.9%) 

1883  (97.1%) 

1939 

2 

38  (6.2%) 

578  (93.8%) 

616 

3 

18(6.4%) 

265  (93.6%) 

283 

4 

16(11.9%) 

118(88.0%) 

134 

5  or  greater 

44  (23.7%) 

142  (76.3%) 

186 

Total 

176  (4.1%) 

4083  (95.9%) 

4259 

Proportion  of  High  Users  by  Risk  Score  Category 


Figure  5.  Proportion  of  High  Users  in  Each  Risk  Score  Category 
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(4)  Ever  Deployed  For  OCO  Deployment 

Studies  such  as  Seal  (2014)  that  links  opioid  use  with  deployments  to  Iraq  and 
Afghanistan  are  not  uncommon.  While  the  percentage  of  high  users  in  Table  8  increases 
with  an  OCO  deployment,  it  is  only  a  1.5%  increase. 


Table  8.  Percentage  of  High  Users  by  Ever  Deployed  OCO 


Ever  Denloved  OCO 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

Yes 

113  (4.8%) 

2243  (95.2%) 

2356 

No 

63  (3.3%) 

1840  (96.7%) 

1903 

Total 

176(4.1%) 

4083  (95.9%) 

4259 

(5)  CM  Acuity  Level 

Many  patients  do  not  have  an  assigned  CM  acuity  level,  because  91%  of  patients 
in  the  data  set  have  not  been  assigned  a  case  manager.  Case  managers  assign  a  score  of 
one  to  five,  with  a  higher  score  indicating  a  patient  with  more  complex  health  issues,  thus 
requiring  greater  medical  oversight.  Appendix  D  explains  the  scoring  in  greater  detail. 
The  statistics  in  Table  9  indicate  that  there  may  be  a  relationship  between  acuity  level  and 
high  opioid  use  as  the  p-value  is  0.0682  based  on  Fisher’s  Exact  Test  (McDonald,  2009). 


Table  9.  Percentage  of  High  Users  by  CM  Acuity  Level 


CM  Acuity  Level 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

0 

129  (3.3%) 

3766  (96.7%) 

3895 

1 

24(11.1%) 

192  (88.9%) 

216 

2 

13  (12.0%) 

95  (88.0%) 

108 

3 

8  (25.8%) 

23  (74.2%) 

31 

4 

0  (0.0%) 

4(1.0%) 

4 

5 

2  (40.0%) 

3  (60%) 

5 

Total 

176(4.1%) 

4083  (95.9%) 

4259 
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(6)  Source  of  Prescription 

There  are  five  prescription  source  categories,  each  corresponding  to  a  logical 
variable  that  takes  value  “TRUE”  if  at  least  one  of  the  prescriptions  for  a  particular 
patient  comes  from  that  source  and  value  “FALSE”  otherwise.  The  five  sources  and  their 
codes  are: 

•  D  =  Direct  Care  (includes  VA  mail  order  pharmacy  refills  made  on  behalf 
of  participating  MTFs) 

•  M  =  Managed  Care  Support  Contractor  (MCSC) 

•  R  =  Theater  Medical  Data  Store 

•  T  =  TRICARE’s  Mail  Order  Program 

•  V  =  VA  CHDR  (Clinical/Health  Data  Repository — Prescription  drug 
information  for  dual  MHS/VA  eligible  beneficiaries — fully  funded  by  the 
VA) 

The  statistics  in  Table  10  indicate  that  a  majority  of  high  opioid  users  received 
their  prescriptions  from  direct  care.  But  a  greater  percentage  of  opioid  users  who  receive 
their  medication  from  the  MCSC  are  high  users.  Table  10  also  shows  that  the  number  of 
high  users  from  the  other  three  sources  is  quite  small. 


Table  1 0.  Percentage  of  High  Users  by  Source  of  Prescription 


Prescription  Source 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

D 

170  (4.4%) 

3726  (95.6%) 

3446 

M 

87(11.5%) 

671  (88.5%) 

758 

R 

2(3.1%) 

62  (96.9%) 

64 

T 

0  (0.0%) 

2(1.0%) 

2 

V 

1  (25%) 

3  (75%) 

4 
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In  the  CNA  study  by  Levy  et  al.  (2014),  it  was  noted  that  patients  who  received 
prescriptions  from  more  than  one  source  are  more  likely  to  become  chronic  users.  Table 
11  examines  the  relationship  between  number  of  source  prescriptions  and  high  opioid 
usage.  We  also  noted  that  out  of  the  84  high  users  that  had  two  or  more  prescription 
sources,  82  of  those  used  the  direct  and  MCSC  sources.  In  our  models  we  replace  the 
five  binary  variables  that  indicate  prescription  source  with  a  single  variable,  number  of 
sources,  that  takes  the  value  “1”  if  the  number  of  sources  is  one  and  “2”  otherwise. 


Table  1 1 .  Opioid  Usage  Based  on  Number  of  Prescription  Sources 


Number  of  Sources 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

One 

92  (2.4%) 

3706  (97.6%) 

3798 

Two  or  More 

84(18.2%) 

377  (81.8%) 

461 

Total 

176  (4.1%) 

4083  (95.9%) 

4259 

(7)  Medical  Diagnoses 

Seal  (2014)  concluded  that  there  is  a  link  between  pain,  mental  health  disorders 
and  PTSD  with  opioid  use.  The  following  seven  medical  risk  diagnoses  are  examined 
more  closely  in  Table  12,  since  these  diagnoses  can  be  associated  with  pain  or  mental 
health  disorders. 

•  Anxiety  Related  Disorders 

•  Dorsopathy 

•  Fracture/Dislocation 

•  Endocrine/Metabolic/Immunity  Disorders  (EMI) 

•  Bone/Joint/Muscle  Infections/Necrosis  (BJMIN) 

•  Arthropathy 

•  PTSD 
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Table  12.  Percentage  of  High  Users  by  Diagnoses 


Diagnoses 

Hi  User  (%) 

Non-Hi  User  (%) 

Total 

(Y)Anxiety  Disorders 

38  (14.7%) 

221  (85.3%) 

259 

(N)Anxiety  Disorders 

138  (3.5%) 

3862  (96.5%) 

4000 

(Y)Dorsopathy  High 

41  (20.2%) 

162  (79.8%) 

203 

(N)Dorsopathy  High 

135  (3.3%) 

3921  (96.7%) 

4056 

(Y)Fracture/Dislocation 

Low 

50  (7.2%) 

640  (92.8%) 

690 

(N)Fracture/Dislocation 

Low 

126  (3.5%) 

3443  (96.5%) 

3569 

(Y)EMI  Disorder  Low 

31  (8.7%) 

324  (91.3%) 

355 

(N)  EMI  Disorder  Low 

145  (3.7%) 

3759  (96.3%) 

3904 

(Y)  BJMIN 

79  (10.7%) 

657  (89.3%) 

736 

(N)  BJMIN 

97  (2.7%) 

3426  (97.2%) 

3523 

(Y)Arthopathy 

80  (8.8%) 

831  (91.2%) 

911 

(N)Arthopathy 

96  (2.9%) 

3252(97.1%) 

3348 

(Y)  PTSD 

25(18.1%) 

113(81.9%) 

138 

(N)  PTSD 

151  (3.6%) 

3970  (96.3%) 

4121 

(Y)-  Presence  of  condition 


(N)-  Absence  of  condition 


Table  12  shows  that  for  all  of  the  above  conditions,  there  is  a  percentage  increase 
in  the  number  of  high  users  with  the  presence  of  the  stated  condition.  The  percentage 
increase  varies  from  3.7%  with  the  fracture/dislocation  condition  to  16.9%  with  the 
dorsopathy  condition.  However  some  of  these  diagnoses  may  not  have  a  significant 
impact  in  predicting  high  opioid  use  due  to  the  low  number  of  patients  diagnosed  with 
that  condition. 

Additionally,  since  there  are  77  medical  risk  conditions  used  in  the  analysis  for 
the  model,  we  explore  whether  the  presence  in  the  number  of  conditions  is  related  to  high 
opioid  usage.  The  red  line  in  Figure  6  is  a  loess  smoother  (see  Faraway  (2015)  for  a 
description  of  loess  smoothers)  of  the  proportion  of  high  users  as  a  function  of  the 
number  of  diagnoses.  It  depicts  an  increasing  relationship  in  the  proportion  of  high  users 
as  the  number  of  diagnosis  conditions  increase.  The  blue  dots  indicate  the  proportion  of 
high  users  with  exactly  the  number  of  diagnoses.  The  gray  dots  in  Figure  6  depict  the 
binary  response  variable  with  random  noise  added  to  both  the  binary  response  variable 
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Proportion  of  High  Utilizers 


and  the  number  of  diagnoses  to  avoid  overlap  of  points.  Because  the  relationship 
between  the  number  of  diagnoses  and  the  proportion  of  high  users  is  so  strong,  we  also 
include  this  constructed  variable,  number  of  diagnoses  in  our  modeling  efforts. 


Proportion  of  High  Users  by  Number  of  Diagnoses 


Number  of  Diagnoses 

Figure  6.  Proportion  of  High  Users  by  Number  of  Diagnoses 
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IV.  ANALYSIS  AND  RESULTS 


The  goal  of  this  chapter  is  to  produce  a  model  to  estimate  the  probability  that  an 
opioid-using  individual  is  a  high  user  based  on  the  diagnoses  and  other  variables 
provided  by  BUMED  and  discussed  in  Chapter  III.  Estimated  probabilities  are  not 
intended  to  be  used  to  classify  individuals  as  high  users  or  not.  Instead,  they  give  a  score, 
much  like  a  credit  score,  to  aid  health  care  decisions.  This  chapter  fits  four  models  on  the 
training  data  set  and  compares  the  results.  The  four  models  are  a  basic  logistic  regression 
model  and  three  machine  learning  models:  the  elastic  net  penalized  logistic  regression, 
the  random  forest  and  the  boosted  tree  models.  The  basic  logistic  regression  model  uses 
only  a  few  explanatory  variables  selected  from  those  described  in  Chapter  III.  It  has  the 
advantage  of  being  easy  to  use  and  to  explain.  The  machine  learning  models,  on  the 
other  hand,  make  use  of  all  the  explanatory  variables.  Our  basic  model  is  compared  at 
the  end  of  this  chapter  with  the  best  model  from  the  other  three  methods.  For  those 
methods,  we  vary  the  parameters  and  choose  the  best  model  within  each  category  type 
using  cross-validated  binomial  deviance  to  avoid  over-fitting  (Hastie,  Tibshirani,  & 
Friedman,  2009).  We  analyze  the  ROC  curves  based  on  the  cross-validated  predictors  to 
choose  the  single  best  model  among  the  three  methods  and  then  finally  compare  it  with 
the  basic  model.  Plotting  a  lift  curve  on  the  test  set  will  allow  us  to  examine  unbiased 
estimates  of  model  performance  for  the  best  model  for  our  study. 

A.  BASIC  MODEL 

For  our  basic  high  user  model,  our  goal  is  to  create  a  simple  model  based  on 
variables  that  might  have  the  strongest  relationship  to  high  opioid  use.  The  standard 
linear  regression  model  is  not  appropriate  in  this  study  with  a  binary  response  variable. 
Instead,  we  use  logistic  regression,  which  is  a  generalization  of  linear  regression  for 
binary  response  variables  (Faraway,  2015).  Fogistic  regression  models  are  easily 
interpretable  and  are  defined  as  follows:  Fet  n  be  the  number  of  observations  in  the 
training  set  and  model  the  response  variable  as  independent  Bernoulli  random  variables 
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where  the  probability  of  a  “high  user,”  P,  for  i=l,...,  n  is  related  to  the  explanatory 
variables  through  the  log-odds  as  shown  in  the  following  equation: 

(  p  ^ 

log  -  —  J3q  +  j3iXa  + ...  -I-  flkXik , 

U-pJ 

where  Xn,  ...Xik  are  the  values  of  the  k  explanatory  variables  for  the  ith  observation  and  [lo, 
are  the  unknown  parameters  to  be  estimated. 

Based  on  Levy  et  al.  (2014),  patients  who  utilized  multiple  sources  for  their 
opioid  prescriptions  were  found  more  likely  to  be  a  chronic  user.  Further,  based  on  our 
analysis  in  Chapter  III,  the  number  of  diagnoses  is  strongly  related  to  the  proportion  of 
high  users.  This  variable  sums  the  77  medical  risk  diagnoses  indicator  explanatory 
variables  to  produce  a  single  variable.  Additionally,  we  consider  the  explanatory 
variables  of  risk  score  and  CM  acuity  level  in  developing  our  basic  model.  Two  variables 
that  we  considered,  but  did  not  include  in  our  basic  model  fit  are  the  age  group  code  and 
the  number  of  days  since  most  recent  OCO  deployment.  Neither  variable  improved  the 
basic  model  fit. 

In  Chapter  III,  we  created  two  new  explanatory  variables:  the  total  number  of 
prescription  sources  and  the  total  number  of  medical  diagnoses  for  each  opioid  user.  We 
will  begin  with  a  model  that  includes  both  of  these  explanatory  variables.  The  fitted 
logistic  regression  model  has  the  form: 


-5.99  +  1.48X1  +  0.23X2, 


where  the  P  is  an  estimated  probability  of  high  use  for  an  individual,  Xi  is  either  one, 
representing  one  source  of  prescription  or  two,  representing  two  or  more  sources  of 
prescription  respectively  and  X2  represents  the  total  number  of  diagnoses.  The  summary 
statistics  for  this  logistic  regression  model  fit  are  given  in  Table  13.  The  z-values  or 
Wald  statistics  and  corresponding  p-values  are  for  a  large  sample  test  of  the  null 
hypothesis  that  each  coefficient  is  zero  against  the  two  sided  test  alternative  that  it  is 
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not  (Hastie  et  al.,  2009).  There  is  strong  evidence  that  the  coefficient  for  the  number  of 
sources  is  not  zero  and  the  same  evidence  is  true  for  the  other  variable,  number  of 
diagnoses.  Thus,  both  explanatory  variables  should  be  used  in  our  basic  model.  The  null 
and  residual  deviance  for  this  model  fit  are  1466.2  on  4258  degrees  of  freedom  and 
1184.3  on  4256  degrees  of  freedom.  The  null  deviance  and  the  residual  deviance  are 
analogous  to  the  total  sum  of  squares  and  residual  sums  of  squares  for  linear  regression 
model  fits  and  are  often  used  to  compute  an  R-squared  value  (Faraway,  2015).  Here  the 
R-squared  value  is  0.24  with  the  interpretation  that  only  24%  of  the  null  deviance  is 
explained  by  the  logistic  regression  fit  with  two  variables. 


Table  13.  Summary  Statistics  for  Logistic  Regression  with  Two  Variables 


Estimated 

Coefficient 

Standard 
Error  (SE) 

Wald 

Statistic 

P-value 

(Intercept) 

-5.99 

0.25 

-23.50 

<0.001 

Number  of  Sources 

1.48 

0.18 

8.34 

<0.001 

Number  of 
Diagnoses 

0.23 

0.02 

11.19 

<0.001 

The  values  in  Table  13  are  on  the  scale  of  the  log-odds  of  being  a  high  user. 
Exponentiating  the  coefficients  yields  the  odds  ratios.  Subjects  with  multiple  sources  of 
prescriptions,  have  nearly  4.5  times  the  odds  of  being  a  high  user  than  those  with  only 
one  source  (Odds  Ratio  (OR)  =  4.4;  95%  Confidence  Interval  (Cl)  (3. 1-6.2)),  holding  the 
number  of  diagnoses  constant.  Similarly,  for  every  additional  diagnosis,  there  is  an 
increase  in  odds  of  being  a  high  user  (OR  =1.26;  95%  Cl  1.21-1.31),  holding  the  number 
of  sources  constant. 

To  check  that  this  is  a  reasonable  basic  model,  we  added  the  interaction  between 
the  two  explanatory  variables,  the  number  of  diagnoses  and  the  number  of  source 
prescriptions.  The  large-sample  likelihood  ratio  test  (LRT)  of  the  null  model  without 
interaction  against  the  alternative  with  interactions  gives  a  p-value  0.037  with  one  degree 
of  freedom.  There  is  evidence  of  interaction  at  a  5%  level  of  significance,  not  at  a  1% 
level.  With  the  large  sample  size,  the  model  with  interactions  may  be  statistically 
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significant  but  may  not  be  practically  significant.  There  needs  to  be  performance 
improvement  to  choose  a  model  with  the  interaction  terms.  To  see  if  there  is  a  practical 
difference  in  the  performance  of  the  two  model  fits  we  compare  their  ROC  curves. 
Common  measures  of  performance  used  for  models  with  a  binary  response  variable  are 
the  true  positive  rate  and  false  positive  rate  where  an  individual  is  classified  as  “positive” 
if  the  model  estimated  probability  of  positive  is  greater  than  a  specified  threshold  (0.50  is 
a  typical  threshold  value).  Here  positive  corresponds  to  high  opioid  use.  Rather  than  use 
a  single  threshold  value,  on  the  training  data,  the  ROC  curve  considers  both  the  true 
positive  rate  and  the  false  positive  rate  for  different  threshold  levels  (James,  Witten, 
Hastie,  &  Tibshirani,  2013).  In  Figure  7,  the  ROC  curve  for  the  model  with  interactions 
is  on  top  of  the  curve  for  the  model  without  interactions.  Since  the  model  without 
interactions  is  simpler  and  not  notably  improved  when  compared  to  the  model  with 
interactions,  out  of  these  two,  we  will  choose  the  model  without  interactions. 


Baseline  Models  With  and  Without  interactions 


Figure  7.  ROC  Curves  Comparing  Models  with  and  without  Interactions 


30 


To  check  that  the  log-odds  of  high  opioid  use  can  be  modeled  as  linear  in  the 
number  of  diagnoses,  we  converted  the  explanatory  variable,  number  of  diagnoses  from 
numeric  to  categorical  and  compared  the  model  fit  with  our  current  basic  model.  The 
ROC  curves  for  the  two  models  plotted  in  Figure  8  look  very  similar,  thus  the  simpler 
model,  based  on  numeric  rather  than  categorical  version  of  the  number  of  diagnoses 
variable  remains  our  basic  model. 


Baseline  Model  Numeric  Vs  Categorical  Variable 


False  positive  rate 

Figure  8.  Comparison  of  Models  with  Number  of  Diagnoses  Converted  to 

Categorical 

Additionally,  we  explored  logistic  regression  with  different  combinations  of  the 
following  four  explanatory  variables  with  and  without  interactions:  number  of 
prescription  sources,  number  of  diagnoses,  risk  score  and  CM  acuity  level.  Similar  to  the 
earlier  results  with  using  the  ROC  curve,  when  we  compared  each  model  with  our  basic 
model  of  two  variables,  our  simple  model  had  very  similar,  if  not  better  results.  Thus,  for 
practical  purposes,  our  basic  model  will  contain  only  two  variables:  the  number  of 
prescription  sources  and  the  number  of  diagnoses  with  no  interactions. 
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B.  MACHINE  LEARNING  MODELS 


In  this  section,  we  compare  the  three  machine  learning  models:  elastic  net 
penalized  logistic  regression,  random  forest  and  boosted  trees.  For  each  method,  we  vary 
the  parameters  and  use  cross-validated  binomial  deviance  to  choose  the  best  model  within 
each  category  type.  In  order  to  choose  the  single  best  machine  learning  model,  we 
analyze  ROC  curves  based  on  cross- validated  true  positive  and  false  positive  rates. 

1.  Elastic  Net  Penalized  Logistic  Regression  Model 

We  use  a  function  called  cv.glmnet  to  conduct  10-fold  cross-validation.  This 
function  is  part  of  the  glmnet  package  in  R  (Friedman,  Hastie,  &  Tibshirani,  2010)  that 
fits  a  regularized  generalized  linear  model  via  penalized  maximum  likelihood.  In  these 
models,  penalties  are  functions  of  the  magnitudes  of  the  explanatory  variables’ 
coefficients.  The  three  choices  of  penalties  are  the  least  absolute  shrinkage  and  selection 
operator  (lasso),  Li  absolute  value  penalty,  the  ridge  regression,  L2  quadratic  penalty  or  a 
combination  of  the  two  called  the  elastic  net  (Goeman,  2010).  How  much  the  likelihood 
is  penalized  is  governed  by  a  parameter  A  ,  chosen  by  cross-validation. 

Tibshirani  (1996),  suggests  that  an  Li  lasso  penalty  performs  better  than  the  ridge 
penalty  when  there  are  a  small  to  moderate  number  of  moderate- sized  effects,  even  out  of 
a  large  number  of  explanatory  variables.  An  L2  ridge  penalty  performs  best  when  there 
are  large  number  of  small  effects  such  as  when  there  is  much  multi-collinearity  among 
the  explanatory  variables.  The  advantage  of  the  lasso  penalty  is  that  it  acts  as  a  variable 
selection  procedure  by  shrinking  coefficients  to  zero.  A  shortcoming  for  lasso  is  when  a 
group  of  variables  are  highly  correlated,  it  tends  to  select  only  one  variable  from 
that  group. 

To  use  the  glmnet  package,  all  explanatory  variables  must  be  numeric.  We 
converted  three  categorical  variables,  CM  acuity  level,  risk  score  and  age  group  code,  to 
numeric  variables.  CM  acuity  levels  were  consolidated  to  form  a  new  binary  variable, 
where  a  “1”  represents  an  individual  assigned  an  acuity  level  and  “0”  meant  patient  was 
not  assigned  an  acuity  level.  The  age  group  code  variable  was  converted  from 
categorical  with  levels  “D,”  “E,”  “F,”  “G”  to  numeric  with  values  1,  2,  3  and  4 
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respectively.  Lastly,  the  risk  score,  with  levels  0,  1,  2,  3,  4,  5+  was  converted  to  a 
numeric  variable  with  values  1  to  6. 

To  choose  the  parameter/!. ,  we  compute  the  cross-validated  prediction  error  for 
approximately  100  different  values  of  A.  For  cross-validation,  we  use  K=10  randomly- 
selected  folds.  The  cv.glmnet  function  fits  the  model  to  nine  folds  and  then  predicts  on 
the  remaining  fold.  This  process  yields  a  prediction  error,  CVi  and  is  repeated  K-l  times 
yielding  the  corresponding  K-l  prediction  errors.  The  average  of  the  ten  prediction  errors 
results  in  CV,  the  cross-validated  prediction  error  in  the  following  equation  (Hastie  et 
al„  2009). 


10 

£cv* 

cv  =  - 

10 

We  use  the  Bernoulli  (or  binomial)  deviance  as  a  measure  of  prediction  error. 
The  plot  in  Figure  9  shows  the  cross-validated  prediction  error  for  the  lasso  logistic 
model  as  a  function  of  log(  A  ).  The  left  most  dotted  vertical  line  shows  the  A  associated 
with  the  minimum  cross-validated  prediction  error.  The  dotted  line  to  the  right  shows  the 
smallest  A  within  the  one  standard  error  (SE)  of  the  minimum  cross-validated  error.  This 
“one-SE  rule”  A  tends  to  give  a  simpler  model.  The  numbers  across  the  top  are  the 
corresponding  number  of  variables  with  non-zero  coefficients  for  that  cross-validated 
prediction  error. 
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Figure  9.  Cross-Validation  Error  vs.  Log(  A  )  for  Binomial  GLM 

We  also  penalized  the  logistic  regression  models  with  elastic  net  penalty.  Here 
the  penalty  is  a  combination  of  lasso  and  ridge  penalties  with  parameter  a  ,  where  a  =  1 
is  a  lasso  penalty,  a  =  0  is  a  ridge  penalty  and  0<a  <1  is  a  combination  of  the  two.  We 
fit  elastic  net  models  varying  a  from  zero  to  one  in  increments  of  0.1.  For  each  a  ,  the 
one-SE  rule  A ,  yielded  a  cross-validated  prediction  error  given  in  Table  14.  The 
smallest  of  these  is  for  a  =  1  corresponding  to  a  lasso  penalty.  Thus  our  best  model 
among  the  penalized  logistic  regression  models,  chosen  among  models  using  all  possible 
explanatory  variables,  is  the  model  with  three  explanatory  variables:  the  two  variables  in 
the  basic  model  and  the  medical  diagnosis  dorsopathies  high. 
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Table  14.  Lowest  CV  while  Varying  a  . 


a  -value 

CV 

0 

0.34329 

0.1 

0.32326 

0.2 

0.31193 

0.3 

0.30556 

0.4 

0.30165 

0.5 

0.29908 

0.6 

0.29728 

0.7 

0.29584 

0.8 

0.29458 

0.9 

0.29348 

1 

0.29261 

2.  Tree-Based  Models 

We  fit  two  machine  learning  models  based  on  Breiman’s  (2001)  classification 
trees.  See  James  et  al.  (2013)  for  a  good  discussion  of  classification  trees  and  related 
models.  For  our  modeling  purposes,  the  greatest  advantage  of  tree -based  models  is  that 
they  naturally  accommodate  potential  interactions  among  explanatory  variables.  In 
contrast,  for  logistic  regression  type  models,  interactions  must  be  deliberately  included  as 
extra  explanatory  variables.  The  two  tree-based  models  considered  in  this  section  are 
random  forests  and  boosted  trees. 

a.  Random  Forest 

Random  forests  average  the  outcomes  of  multiple  classification  trees  (Breiman, 
2001).  Each  tree  is  fit  using  a  bootstrapped  sample  taken  from  the  training  set  and  during 
tree  construction  only  “mtry”  number  of  variables  randomly  selected  from  the  set  of 
explanatory  variables  are  considered  at  each  split  (where  mtry  is  a  parameter  selected  by 
the  user).  We  use  the  randomForest  package  to  fit  an  initial  model,  while  varying  the 
value  of  mtry,  the  number  of  variables  randomly  sampled  at  each  split  (Liaw  & 
Wiener,  2002).  The  default  value  of  mtry  is  the  square  root  of  the  number  of 
explanatory  variables. 
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This  method  also  produces  variable  importance  measures  by  averaging  the 
outcomes  of  the  trees.  These  variable  importances  explain  variation  in  the  response  as  a 
function  of  the  explanatory  variables.  Variable  importance  provides  the  average 
importance  of  each  variable  within  the  model  based  on  the  mean  decrease  in  node 
impurity  or  gini  score  (Breiman,  2001).  Table  15  lists  the  most  influential  explanatory 
variables  in  the  final  random  forest  model  based  on  gini  score.  The  larger  the  value,  the 
greater  the  role  that  explanatory  variable  plays  in  partitioning  the  data  (Witten,  Frank,  & 
Hall,  2011).  All  five  of  these  variables  were  explored  while  constructing  our  basic 
logistic  regression  model  in  Section  A. 


Table  15.  Top  Five  Influential  Variables 


Explanatory  Variable 

Mean  Decrease  Gini 

Number  of  Diagnoses 

23.9215071 

Days  Since  Most  Recent  OCO  Depl 

23.01504065 

Risk  Score 

17.51883699 

Age  Group  Code 

15.07520383 

Number  of  Sources 

13.25949064 

In  addition,  we  use  the  train  function  from  the  caret  package  in  R  to  automate 
searching  for  the  best  mtry  (Kuhn,  2016).  This  function  returns  an  object  that  contains  the 
performance  values  for  each  combination  of  model  parameters  specified.  We  use  the  ten¬ 
fold  cross-validation  to  find  the  model  with  the  lowest  cross-validated  log-loss  (which  is 
proportional  to  the  Bernoulli  deviance).  Figure  10  suggests  that  the  minimum  cross- 
validated  log-loss  value  occurs  between  ten  to  16  randomly  selected 
explanatory  variables. 
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Cross-Validated  LogLoss  for  up  to  40  Randomly  Selected  Variables 


#Randomly  Selected  Predictors 

Figure  10.  Cross- Validated  Log-Loss  Based  on  Number  of  Variables 

We  select  14  as  our  mtry  value  because  it  has  the  smallest  cross-validated  log-loss 
according  to  Table  16.  Our  random  forest  model  needs  to  have  a  sufficient  number  of 
trees  to  ensure  every  input  row  gets  predicted  at  least  a  few  times  to  give  good 
performance  and  that  the  classification  error  stabilizes  (James  et  al.,  2013).  For  this 
reason,  our  best  random  forest  model  grows  500  trees  and  randomly  selects  14 
explanatory  variables  at  each  split. 


Table  1 6.  Log-Loss  for  Various  Mtry  Values 


mtry  value 

CV  Loa-loss 

2 

0.3843926 

4 

0.2690813 

6 

0.23666 

8 

0.2426391 

10 

0.2291392 

12 

0.2318257 

14 

0.2244436 

16 

0.2259266 

18 

0.2403286 

20 

0.2405139 

22 

0.2272155 

24 

0.2697607 
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b.  Boosted  Tree 


Like  a  random  forest  model,  the  boosted  tree  model  is  another  tree-based  model 
that  is  a  linear  function  of  classification  trees.  However,  it  grows  incrementally  to 
improve  the  prediction  results  (James  et  al.,  2013).  This  is  different  from  random  forest 
as  the  growth  of  a  particular  tree  is  influenced  by  the  performance  of  those  that  have 
already  been  grown.  Thus,  the  trees  may  not  need  to  be  as  large  and  this  helps  with 
interpretability.  The  maximum  tree  depth  corresponds  to  the  potential  degree  of 
interaction  among  explanatory  variables.  For  example,  trees  with  a  single  split  (depth 
one)  are  additive  with  no  interactions,  trees  with  depth  two  can  include  up  to  two-way 
interactions,  etc.  The  trees  improve  by  fitting  to  the  previous  residuals  instead  of  to  a 
response  variable  and  continues  until  a  specified  number  of  trees  are  created  (James  et 
al.,  2013). 

The  tuning  parameters  for  boosted  tree  models  are  the  number  of  trees,  the 
shrinkage  or  learning  rate,  and  the  interaction  depth.  We  use  cross-validation  to  select  an 
appropriate  number  of  trees  as  overfitting  can  occur  if  this  parameter  is  too  large.  For  the 
shrinkage  parameter,  we  used  a  recommended  starting  value  of  0.001  according  to 
James  et  al.  (2013).  We  varied  this  parameter  up  to  0.05  to  find  combinations  of 
shrinkage  parameter  and  number  of  trees  that  could  achieve  good  performance.  The  third 
parameter,  interaction  depth,  controls  the  degree  of  interaction  and  adds  complexity  to  the 
model.  For  example,  an  interaction  depth  of  2  gives  a  model  with  up  to  two-way 
interactions,  e  varied  this  parameter  from  1  to  4  and  using  cross-validation  found  that 
complex  interactions  were  not  needed. 

We  use  the  gbm  function  in  R  from  the  GBM  package  to  construct  our  boosted 
decision  tree  (Ridgeway,  2015).  To  choose  the  optimal  model,  we  use  ten-fold  cross- 
validation  on  the  Bernoulli  deviance.  The  three  parameters  in  this  model  that  have  the 
smallest  cross-validated  Bernoulli  deviance  are  244  trees,  0.05  shrinkage  rate  and  an 
interaction  depth  of  two.  This  model  fit  suggests  that  including  two-way  interactions 
among  the  explanatory  variables  might  improve  model  performance. 
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C.  COMPARISON  OF  MODELS  BUILT  LROM  MACHINE  LEARNING 

APPROACHES 

ROC  curves  are  used  to  compare  the  performance  of  the  three  machine  learning 
models  in  our  study:  lasso  penalized  logistic  regression,  random  forests  and  boosted 
trees.  It  appears  initially  that  the  random  forest  and  boosted  tree  models  outperformed 
the  lasso  penalized  regression  when  analyzing  the  ROC  curves  on  the  training  data. 
Upon  further  examination,  the  two  tree -based  models  had  over-fit  the  training  data  and 
thus  had  inflated  model  performance.  Figure  11  shows  the  random  forest  ROC  curve 
based  on  the  training  set.  It  indicates  that  the  random  forest  model  can  predict  high 
opioid  use  with  almost  100%  accuracy  on  the  training  set.  Unfortunately,  these  results  do 
not  generalize  to  an  independent  data  set.  Therefore,  to  compare  our  machine  learning 
models,  we  plot  the  cross-validated  ROC  curves. 


Best  Random  Forest  ROC  Plot 


False  positive  rate 

Figure  11.  ROC  Plot  of  Over-Fit  Random  Forest  Model  on  Training  Set 

Based  on  Figure  12,  the  lasso  penalized  logistic  regression  and  the  boosted  tree 
performed  similarly  well  on  the  cross-validated  ROC  curve  and  slightly  better  than  the 
random  forest  model.  However,  because  the  boosted  model  uses  many  trees  and  all  of 
the  explanatory  variables,  it  is  comparatively  more  complex  than  the  lasso  penalized 
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regression.  For  this  reason,  our  choice  for  the  best  machine  learning  model  is  the  lasso 
penalized  regression.  The  evaluation  of  the  cross-validated  ROC  curve  performance  and 
model  simplicity  will  be  our  approach  in  comparing  our  best  machine  learning  model 
with  the  basic  logistic  regression  model. 


Cross-Validated  ROC  Curves  Comparison  of  Machine  Learning  Models 


Figure  12.  Cross- Validated  ROC  Curves  Comparing  Machine  Learning  Models 

D.  RESULTS  AND  DISCUSSION 

In  this  section,  we  compare  the  lasso  penalized  logistic  model  with  our  basic 
logistic  regression  model.  Figure  13  displays  the  performance  of  the  two  models  using 
their  cross-validated  ROC  curves.  There  is  little  difference  in  ROC  curve  performance 
between  the  two  models.  So  applying  the  simplistic  approach  to  model  selection,  the 
model  we  recommend  for  implementation  in  a  health  care  setting  is  the  simpler  logistic 
regression  model.  There  are  a  few  reasons  for  this.  First  and  foremost,  is  easier 
implementation.  A  logistic  regression  with  two  explanatory  factors  and  with  one  fewer 
explanatory  variable  is  easy  to  replicate  and  reproduce.  The  log-odds  and  probability  of  a 
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high  user  for  the  logistic  regression  model  are  explicit  and  thus,  conceptually,  easy  to 
communicate. 


Basic  Model  vs  Lasso  Penalized  Logistic  Model 


Figure  13.  Cross- Validated  ROC  Curves  Comparing  Basic  Logistic  Model  with 

Boosted  Tree  Regression  Model 

With  the  selection  of  our  basic  logistic  regression  model  as  the  preferred  model 
for  implementation,  we  evaluate  its  performance  on  the  test  set  and  using  a  more  practical 
approach  to  evaluate  model  performance  by  plotting  a  lift  curve.  This  is  a  more 
functional  method  to  examine  how  our  model  performs.  Think  of  the  estimated 
probability  of  high  opioid  use  (or  equivalently,  the  estimated  log-odds)  as  an  individual’s 
score.  Now  suppose  we  compute  this  score  for  all  opioid  users  in  a  population,  ranking 
their  scores  from  highest  to  lowest.  If  the  model  is  useful  the  sub-population  with  the 
highest  scores  should  have  a  larger  proportion  of  high  opioid  users  than  the  proportion  of 
high  opioid  users  in  the  entire  population.  Lift  is  defined  as  the  ratio  of  the  proportion  of 
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high  users  in  the  sub-population  to  the  ratio  of  high  users  in  the  general  population.  The 
lift  plot  gives  lift  as  a  function  of  the  proportion  in  the  sub-population.  Lift  curves 
always  reach  point  (1,1)  because  by  dedicating  resources  to  100%  of  the  population,  the 
probability  of  identifying  a  high  user  will  be  the  same  as  the  actual  probability  of  high 
users  in  the  data  set  (Witten  et  al.,  201 1). 

Figure  14  shows  the  lift  curve  on  the  test  set  to  analyze  model  performance.  Due 
to  the  low  number  of  high  users  in  the  test  data  set,  the  logistic  regression  had  a  lot  of 
variation  when  the  proportion  of  the  total  population  is  less  than  0.05.  The  lift  curve  goes 
through  the  x  coordinate  at  0.10  and  intersects  the  y  coordinate  at  four,  meaning  there  is  a 
lift  of  four.  In  practical  terms,  by  dedicating  resources  to  only  10%  of  the  population,  we 
can  now  improve  and  correctly  assess  the  proportion  of  high  users  at  four  times  the  actual 
rate.  Since  the  actual  percentage  of  high  users  in  our  test  set  is  4%,  we  improve  our 
probability  of  identifying  a  high  user  to  16%.  Likewise,  if  we  dedicate  resources  to  50% 
of  the  population,  the  lift  is  two,  meaning  the  probability  of  identifying  a  high  user  is 
doubled.  So  depending  on  how  well  we  want  to  assess  the  proportion  of  high  users  or 
how  much  resources  we  have  available  for  use,  we  can  vary  the  certain  threshold  sub¬ 
population  to  increase  our  probability  of  identifying  a  high  user. 

We  should  note  that  the  lift  plot  in  Figure  14  only  shows  as  an  estimate  of  model 
performance.  However,  it  illustrates  how  a  model  like  the  basic  model  might  be  useful  in 
practice.  In  the  next  chapter  we  outline  how  our  modeling  efforts  can  be  improved  to  be 
used  as  an  operational  health  care  tool. 
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Lift  Value 


Lift  Curve  of  Basic  Model  on  Test  Set 


Figure  14.  Lift  Curve  of  the  Basic  Logistic  Regression  Model  on  the  Test  Set 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


Our  goal  for  this  thesis  is  to  examine  opioid  users  in  the  AD  military  population 
and  to  build  a  model  that  estimates  the  probability  that  an  opioid-using  individual  is  a 
high  user  as  defined  by  BUMED.  In  order  to  build  a  good  performing  model,  we  had  to 
examine  many  explanatory  variables  that  potentially  influence  an  individual’s  opioid  use. 
Our  approach  was  to  select,  if  possible,  a  simple,  executable  model  and  as  a  result,  we 
reduced  the  initial  91  variables  down  to  two. 

The  model  that  we  recommend  is  a  logistic  regression  model  with  two 
explanatory  variables,  with  no  interactions.  Those  two  variables  are  the  number  of 
prescription  sources  for  the  opioid  medications  and  the  total  number  of  diagnoses  from 
the  M2  risk  file.  Although  simpler,  this  model  performed  well  when  compared  to  the 
more  complex  machine  learning  models. 

This  logistic  regression  model  has  the  potential  to  benefit  Navy  Medicine  to  make 
important  decisions  for  their  opioid  prescribed  patients.  The  tool  estimates  the 
probability  that  an  opioid  user  is  a  high  user.  With  this  information,  health  care  leaders 
can  better  plan  and  manage  finite  resources  to  focus  on  the  prevention  and  treatment  of 
the  higher  risk  patients.  This  concentrated  coordination  of  care  results  in  improved 
patient  care  for  this  sub-population,  reduced  long  term  cost  for  the  military  health  care 
system  and  overall,  a  more  medically  ready  military  force. 

Due  to  the  limited  demographics  data  (for  example,  gender  is  not  included  among 
our  explanatory  variables)  and  time  period  of  the  data  set,  spanning  only  two  years,  there 
are  factors  associated  with  high  opioid  users  that  may  not  be  accounted  for.  Additionally, 
the  nature  of  military  jobs  involves  changing  duty  stations  and  MTF  assignments  every 
few  years.  The  fact  that  we  are  only  observing  AD  assigned  to  the  Puget  Sound  area 
clinics  means  that  high  opioid  users  that  change  enrollment  sites  away  from  those  clinics 
are  under-represented.  It  also  means  that  AD  assigned  to  Puget  Sound  might  not 
represent  a  cross-section  of  the  general  Navy  population.  Lastly,  the  study  analyzed  AD 
patients  who  were  already  prescribed  opioids  to  examine  factors  that  can  contribute  to 
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high  opioid  use.  This  excludes  data  for  analysis  of  the  majority  of  the  AD  patients  who 
were  not  prescribed  opioids. 

For  this  reason,  a  suitable  follow-on  study  could  examine  data  on  patients  that  did 
not  have  opioids  prescribed  to  compare  the  risk  factors  of  becoming  an  opioid  user.  The 
follow-on  study  should  also  consider  a  number  of  other  explanatory  variables  such  as 
gender  and  type  of  duty  and  be  expanded  to  include  other  MTFs.  Further,  if  complete 
records  of  opioid  prescription  use  are  not  available  because  patients  change  duty  stations, 
then  dates  a  patient  enrolls  and  dis-enrolls  from  the  MTF  must  be  included.  With  that 
said,  this  research  is  simply  an  initial  effort  to  explore  ways  to  identify  opioid  prescribed 
patients  that  may  have  greater  risk  of  becoming  a  high  opioid  user. 
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APPENDIX  A.  CANCER  DIAGNOSIS  CODES 

The  following  table  lists  the  codes  associated  with  cancer  diagnosis  (Ellis,  2015). 


Cancer  of  head  and  neck 


1400  1401  1403  1404  1405  1406  1408  1409  1410  1411  1412  1413  1414  1415  1416  1418  1419  1420  1421  1422  1428  1429  1430 

1431  1438  1439  1440  1441  1448  1449  1450  1451  1452  1453  1454  1455  1456  1458  1459  1460  1461  1462  1463  1464  1465  1466 

1467  1468  1469  1470  1471  1472  1473  1478  1479  1480  1481  1482  1483  1488  1489  1490  1491  1498  1499  1600  1601  1602  1603 

1604  1605  1608  1609  1610  161 1  1612  1613  1618  1619  1950  2300  2310  V1001  V1002  V1021 

Cancer  of  esophagus 

1500  1501  1502  1503  1504  1505  1508  1509  2301  V1003 

Cancer  of  stomach 

1510  1511  1512  1513 1514  1515  1516  1518  1519  20923  2302  V1004 

Cancer  of  colon 

1530  1531  1532  1533  1534  1535  1536  1537  1538  1539  1590  20910  20911  20912  20913  20914  20915  20916  2303  V1005 

Cancer  of  rectum  and  anus 

1540  1541  1542  1543  1548  20917  2304  2305  2306  79670  79671  79672  79673  79674  79676  V1006 

Cancer  of  liver  and  intrahepatic  bile  duct 

1550  1  551  1552  2308  V1007 

Cancer  of  pancreas 

1570  1571  1572  1573 1574  1578  1579 

Cancer  of  other  Gi  organs:  peritoneum 

1520  1521  1522  1523  1528  1529  1560  1561  1562  1568  1569  1580  1588  1589  1591  1598  1599  20900  20901  20902  20903  2307 
2309  V1000  V1009 

Cancer  of  bronchus:  lung 

1622  1623  1624  1625  1628  1629  20921  2312  V1011 

Cancer:  other  respiratory  and  intrathoracic 

1620  1630  1631  1638  1639  1650  1658  1659  2311  2318  2319  V1012  V1020  V1022 

Cancer  of  bone  and  connective  tissue 

1700  1701  1702  1703  1704  1705  1706  1707  1708  1709  1710  1712  1713  1714  1715  1716  1717  1718  1719 

Melanomas  of  skin 

1720  1721  1722  1723  1724  1725  1726  1727  1728  1729  V1082 

Other  non-epithelial  cancer  of  skin 

1730  17300  17301  17302  17309  1731  17310  17311  17312  17319  1732  17320  17321  17322  17329  1733  17330  17331  17332 

17339  1734  17340  17341  17342  17349  1735  17350  17351  17352  17359  1736  17360  17361  17362  17369  1737  17370  17371 

17372  17379  1738  17380  17381  17382  17389  1739  17390  17391  17392  17399  20931  20932  20933  20934  20935  20936  2320 

2321  2322  2323  2324  2325  2326  2327  2328  2329  VI 083 

Cancer  of  breast 

1740  1741  1742  1743  1744  1745  1746  1748  1749  1750  1759  2330  V103 

Cancer  of  uterus 

17918201821  1828  2332  V1042 
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Cancer  of  cervix 


1800  1801  1808  1809  2331  7950  79506  V1041  79501  79502  79503  79504 

Cancer  of  ovary 

1830 VI 043 

Cancer  of  other  female  genital  organs 

181  1832  1833  1834  1835  1838  1839  1840  1841  1842  1843  1844  1848  1849  2333  23330  23331  23332  23339  79516  V1040 

V1044 

Cancer  of  prostate 

185  2334  VI 046 

Cancer  of  testis 

1860  1869  V1047 

Cancer  of  other  male  genital  organs 

1871  1872  1873  1874  1875  1876  1877  1878  1879  2335  2336  V1045  V1048  V1049 

Cancer  of  bladder 

1880  1881  1882  1883  1884  1885  1886  1887  1888  1889  2337  V1051 

Cancer  of  kidney  and  renal  pelvis 

18901891  20924  VI 052  VI 053 

Cancer  of  other  urinary  organs 

1892  1893  1894  1898  1899  2339  VI 050  VI 059 

Cancer  of  brain  and  nervous  system 

1910  1911  1912  1913  1914  1915  1916  1917  1918  1919  1920  1921  1922  1923  1928  1929  V1085  V1086 

Cancer  of  thyroid 

193  25802  25803  VI 087 

Hodgkin's  disease 

20100  20101  20102  20103  20104  20105  20106  20107  20108  201 10  20111  20112  201 13  201 14  201 15  201 16  20117  201 18  20120 
20121  20122  20123  20124  20125  20126  20127  20128  20140  20141  20142  20143  20144  20145  20146  20147  20148  20150  20151 
20152  20153  20154  20155  20156  20157  20158  20160  20161  20162  20163  20164  20165  20166  20167  20168  20170  20171  20172 
20173  20174  20175  20176  20177  20178  20190  20191  20192  20193  20194  20195  201%  20197  20198  V1072 _ 

Non-Hodgkin's  lymphoma 

20000  20001  20002  20003  20004  20005  20006  20007  20008  20010  2001 1  20012  20013  20014  20015  20016  20017  20018  20020 
20021  20022  20023  20024  20025  20026  20027  20028  20030  20031  20032  20033  20034  20035  20036  20037  20038  20040  20041 
20042  20043  20044  20045  20046  20047  20048  20050  20051  20052  20053  20054  20055  20056  20057  20058  20060  20061  20062 
20063  20064  20065  20066  20067  20068  20070  20071  20072  20073  20074  20075  20076  20077  20078  20080  20081  20082  20083 
20084  20085  20086  20087  20088  20200  20201  20202  20203  20204  20205  20206  20207  20208  20210  2021 1  20212  20213  20214 
20215  20216  20217  20218  20220  20221  20222  20223  20224  20225  20226  20227  20228  20270  20271  20272  20273  20274  20275 
20276  20277  20278  20280  20281  20282  20283  20284  20285  20286  20287  20288  20290  20291  20292  20293  20294  20295  20296 
20297  20298  V1071  V1079 
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Leukemas 


20240  20241  20242  20243  20244  20245  20246  20247  20248  2031  20310  2031 1  20312  2040  20400  20401  20402  2041  20410 

2041 1  20412  2042  20420  20421  20422  2048  20480  20481  20482  2049  20490  20491  20492  2050  20500  20501  20502  2051 

20510  2051 1  20512  2052  20520  20521  20522  2053  20530  20531  20532  2058  20580  20581  20582  2058  20580  20581  20582 

2060  20600  20601  20602  2061  2061020611  206122062  2062020621  2062220682068020681  20682206820690  20691 

20692  2070  20700  20701  20702  2071  20710  2071 1  20712  2072  20720  20721  20722  2078  20780  20781  20782  2080  20800 

20801  20802  2081  20810  2081 1  20812  2062  20820  20821  20822  2068  20680  20881  20882  2089  20890  20881  20892  VI 060 

V1061  V1062  V1063  V1069 

Multiple  myeloma 

2030  20300  20301  20302  2038  20380  20381  20382 

Cancer  other  and  unspecified  primary 

1640  1641  1642  1643  1648  1649  1760  1761  1762 1763  1764  1765  1768  1769  1900  1901  1902  1903  1904  1905  1906  1907  1908 
1909  1940  1941  1943  1944  1945  1946  1948  1949 1951  1952  1953  1954  1955  19582023020231  2023220233  2023420235 

20236  20237  20238  20250  20251  20252  20253  20254  20255  20256  20257  20258  20260  20261  20262  20263  20264  20265  20266 
20267  20268  20922  20925  20926  20927  2340  2348  2349  7951  79510  7951 1  79512  79513  79514  V1029  V1081  V1084  V1088 
V1089  V109  V1090  V1091  V711 

Secondary  malignancies 

I960  1961  1962  1963  1965  1966  1968  1969 1970  1971  1972  1973  1974  1975 1976  1977  1978  1960  1981  1982  1983  1964  1965 
1966  1967  19681  19882  19889  20971  20972  20973  20974  51181  78851 

Malignant  neoplasm  without  specification  of  site 

1990  1991  1992  2092C  20929  20930  20970  20975  20979 

Neoplasms  of  unspecified  nature  or  uncertain  behavior 

2350  2351  2352  2353  2354  2355  2356  2357  2358  2358  2360  2361  2362  2363  2364  2365  2366  2367  23690  23691  23699  2370 
2371 2372  2373  2374  2375  2376  2377  23770  23771 23772  23773  23779  2379  2380  2381  2382  2383  2384  2385  2386  2387 

23871  23872  23873  23874  23875  23876  23877  23879  2388  2388  2390  2391  2392  2393  2394  2395  2396  2397  2398  23981 
239892399 

Maintenance  chemotherapy,  radiotherapy 

V580V581  V5811  V5812V661  V662V671V672 

Benign  neoplasm  of  uterus 

2180  2181  2182  2189  2190  2191  2196  2199 

Other  and  unspecified  benign  neoplasm 

20940  20941  2094220943  20950  20951  20952  2095320954  20955  2095620957  20960  20961  209622096320964  2096520966 
20967  20969  2100  2101  2102  2103  2104  2105  2106  2107  2108  210921102111  21122113  2114  2115211621172118  2119 

2120  2121  2122  2123  2124  2125  2126  2127  2128  2129  2130  2131  2132  2133  2134  2135  2136  2137  2138  2139  2140  2141  2142 
2143  2144  2148  2149  2150  2152  2153  2154  21552156  2157  2158  2159  2160  2161  2162  2163  2164  2165  2166  2167  2168  2169 
217220  22102211  22122218221922202221  222222232224222822292230  2231  2232  223322381  22388223922402241 
2242  2243  2244  2245  2246  2247  2248  2249  2250  2251  2252  2253  2254  2258  2259  226  2270  2271  2273  2274  2275  2276  2278 
2279  22900  22801  22802  22803  22804  22809  2281  2290  2298  2299  V1272 
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APPENDIX  B.  LIST  OF  91  EXPLORATORY  VARIABLES 


Age.Group.Code* 

Neoplasm.  Cancer.  .High. 

Acute.Reaction. to. Stress* 

Neoplasm. Cancer.  .Very. High. 

Risk.Score..No.Truncation* 

Non.Psychotic. Disorders.of. Childhood 

Days.Since.Most.Recent.OCO.Depl* 

Osteoarthrosis 

Ever.Deployed.Flag..OCO.* 

Oth.Musculoskel.Sys..and.Connective.Tissue 

Anxiety .  Related .  Di  sorder  s 

Other .  Congenital .  Anomalie  s 

Arthropathy 

Other.Digestive.System.Diseases 

Asthma++ 

0  ther .  Inj  ur  y . .  Lo  w . 

Bone  Joint. Muscle.Infections. Necrosis 

Other.  Inj  ur  y . .  Med. 

Cardiac. .High. .Rx 

Other.  Inj  ur  y . .  High . 

Case.Management.  Acuity. Lev  el* 

Other.Mycoses 

Central.Nervous. System.. Low. 

Other.Neurotic.Disorders++ 

Central.Nervous. System.. High. 

Other.Non.Psychotic.Depressive.Disorders++ 

Cerebrovascular.Disease 

Other.Non.Psychotic.Disorders++ 

Chronic. Ulcer. of.  Skin..  Except.  Decubitus 

Other .  Psychotic .  Di  sorder  s ++ 

Circulatory. Cardiovascular.  .Low. 

Personality .  Di  sorder  s 

Circulatory  .Cardiovascular.  .Med. 

Polyneuropathy 

Circulatory  .Cardiovascular.  .High. 

Psychotic  .Disorders  .of  .Childhood++ 

Circulatory.Cardiovascular..Very.High. 

PTSD++ 

Congestive. Heart.Failure 

Pulmonary. Respiratory.. Low. 

CP .  .Hemorrhage . .  Other .  Paralytic .  S  yn 

Pulmonary  .Respiratory . .  Med . 

Diabetes 

Pulmonary  .Respiratory . .  High . 

Cystic. Fibrosis 

Quadriplegia . .  Other.  Extensive .  Paralysis 

Diseases. of.  the. Blood.. Low. 

Renal. Failure.. Low. 

Diseases. of.  the. Blood.. Med. 

Renal. Failure.. Med. 

Diseases. of.  the. Blood.. High. 

Renal. Failure.. High. 

Diseases. of.the. Breast 

Respirator.  Arrest.Dependence.Trach.Stat 

Diseases. of.Ear.Mastoid.Process 

Schizophrenic.Disorder 

Diseases. of.  Genitourinary. System 

Seizure.Disorders.and.Convulsions 

Disorders.of. Immunity 

Seretonin.3. Receptor.  Antagonist.Rx++ 

Disorders  .of  .the .  Eye .  and .  Adnexa 

Skin .  and .  S  ubcutaneous .  Tis  sue . .  Lo  w . 

Disturbance.of.Conduct++ 

Skin. and. Subcutaneous. Tissue.. Med. 

Dorsopathies..Low. 

Skin. and. Subcutaneous. Tissue.. High. 

Dorsopathies..High. 

S  ub  stance .  Dependence 

Endocrine . .  Metabolic . .  Immune  .Dis . .  Lo  w 

Substance.  Abuse++ 

Endocrine . .  Metabolic . .  Immune  .Dis . .  Hig 
h. 

Substance. Induced.Mental.Disorders++ 
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Age.Group.Code* 

Neoplasm.  Cancer.  .High. 

Fracture.Dislocation..Low. 

Traumatic. Brain.Injury.. Low. ++ 

Fracture.Dislocation..Med. 

Traumatic .  Brain.  Inj  ury .  .High . ++ 

Fracture.Dislocation..High. 

V ert .  Fracture  s . .  Spinal  .C  ord.  Dis .  Inj  ury 

GLInfectious. Parasitic. .Low. 

Vascular. Disease 

GI.  Infectious .  Parasitic . .  Med . 

D_bool* 

GLInfectious. Parasitic. .High. 

M_bool* 

Maj.CC.of.Medical.Care.and.Trauma 

R_bool* 

Multiple .  Sclerosis 

T_bool* 

Neoplasm.Cancer..Low. 

V_bool* 

Neoplasm.Cancer.  .Med. 

*  Denotes  variables  added  that  are  not  part  of  the  M2  medical  risk  conditions. 
++  Denotes  variables  not  in  the  WRA  model. 
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APPENDIX  C.  MEDICAL  CONDITION  CATEGORIES  IN  WRA 

MODEL 


The  following  table  lists  the  90  condition  categories  in  the  Wakely  Risk  Assessment 
Model  (Mehmud,  2012). 


WRA  Category  Description 

WRA# 

Arthropathies 

WRA1 

Bone/Joint/Muscle  Infections/Necrosis 

WRA2 

Central  Nervous  System 

WRA3 

Central  Nervous  System  (H) 

WRA4 

Cerebral  Palsy,  Hemorrhage  and  Other  Paralytic  Syndromes 

WRA5 

Cerebrovascular  Disease 

WRA6 

Chronic  Ulcer  of  Skin,  Except  Decubitus 

WRA7 

Circulatory/Cardiovascular  (H) 

WRA8 

Circulatory/Cardiovascular  (L) 

WRA9 

Circulatory/Cardiovascular  (M) 

WRA  10 

Cirrhosis  of  Liver* 

WRA  11 

Congestive  Heart  Failure 

WRA  12 

Cystic  Fibrosis 

WRA  13 

Diabetes  with  Ophthalmologic  or  Unspecified  Manifestation 

WRA  14 

Diabetes  with  Renal  or  Other  Specified  Manifestation 

WRA  15 

Diabetes  without  Complication 

WRA  16 

Dialysis  Status* 

WRA  17 

Diseases  of  the  Blood  (H) 

WRA  18 

Diseases  of  the  Blood  (L) 

WRA  19 

Diseases  of  the  Blood  (M) 

WRA20 

Diseases  of  the  Blood  (VH) 

WRA21 

Diseases  of  the  Ear/Mastoid  Process 

WRA22 

Diseases  of  the  Genitourinary  System 

WRA23 

Disorders  of  Immunity 

WRA24 

Disorders  of  the  Eye  &  Adnexa 

WRA25 

Dorsopathies 

WRA26 

Dorsopathies  (H) 

WRA27 

Drug/Alcohol  Psychosis  or  Dependence 

WRA28 

Endocrine,  Metabolic,  and  Immunity  Disorders 

WRA29 

Endocrine,  Metabolic,  and  Immunity  Disorders  (H) 

WRA30 

End-Stage  Liver  Disease* 

WRA31 

EXCL* 

WRA32 

53 


WRA  Category  Description 

WRA# 

Fracture/Dislocation 

WRA33 

Gastrointestinal/Infectious/Parasitic  (H) 

WRA34 

Gastrointestinal/Infectious/Parasitic  (L) 

WRA35 

Gastrointestinal/Infectious/Parasitic  (M) 

WRA36 

HIV/AIDS* 

WRA37 

Inflammatory  Bowel  Disease* 

WRA38 

Injury/Poisoning 

WRA39 

Lymphatic,  Head  and  Neck,  Brain,  and  Other  Major  Cancers  (H)* 

WRA40 

Lymphatic,  Head  and  Neck,  Brain,  and  Other  Major  Cancers  (L)* 

WRA41 

Lymphatic,  Head  and  Neck,  Brain,  and  Other  Major  Cancers 

(M)* 

WRA42 

Major  Complications  of  Medical  Care  and  Trauma 

WRA43 

Major  Depressive,  Bipolar,  and  Paranoid  Disorders 

WRA44 

Major  Organ  Transplant  Status* 

WRA45 

Mental  Disorders 

WRA46 

Mental  Disorders  (H) 

WRA47 

Metastatic  Cancer  and  Acute  Leukemia* 

WRA48 

Multiple  Sclerosis 

WRA49 

Neonate* 

WRA50 

Neonate  (H)* 

WRA51 

Neoplasm  of  Bone,  Connective  Tissue,  Skin,  &  Breast 

WRA52 

Neoplasm  of  Bone,  Connective  Tissue,  Skin,  &  Breast  (H) 

WRA53 

Neoplasm  of  Digestive/Peritoneum 

WRA54 

Nephritis* 

WRA55 

Osteoarthrosis 

WRA56 

Other  Congenital  Anomalies 

WRA57 

Other  Digestive  System  Diseases 

WRA58 

Other  Heart  Disease 

WRA59 

Other  Infectious  &  Parasitic  Diseases* 

WRA60 

Other  Infectious  &  Parasitic  Diseases  (H)* 

WRA61 

Other  Musculoskeletal  System  &  Connective  Tissue 

WRA62 

Other  Mycoses 

WRA63 

Other  Neoplasm 

WRA64 

Other  Pulmonary/Respiratory 

WRA65 

Other  Rare* 

WRA66 

Other  Transplant  Related* 

WRA67 

Parkinson's  and  Huntington's,  other  motor  control  Diseases* 

WRA68 

Polyneuropathy 

WRA69 

Pregnancy  (Incomplete)* 

WRA70 

Pregnancy  Related* 

WRA71 

Proliferative  Diabetic  Retinopathy  and  Vitreous  Hemorrhage* 

WRA72 
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WRA  Category  Description 

WRA# 

Protein-Calorie  Malnutrition* 

WRA73 

Pulmonary/Respiratory  (H) 

WRA74 

Pulmonary/Respiratory  (L) 

WRA75 

Pulmonary/Respiratory  (M) 

WRA76 

Quadriplegia,  Other  Extensive  Paralysis 

WRA77 

Renal  Failure  (H) 

WRA78 

Renal  Failure  (F) 

WRA79 

Renal  Failure  (M) 

WRA80 

Respirator  Arrest,  Dependence/Tracheostomy  Status 

WRA81 

Rheumatoid  Arthritis  and  Inflammatory  Connective  Tissue 
Disease* 

WRA82 

Schizophrenia 

WRA83 

Seizure  Disorders  and  Convulsions 

WRA84 

Septicemia/Shock 

WRA85 

Skin  &  Subcutaneous  Tissue 

WRA86 

Skin  &  Subcutaneous  Tissue  (H) 

WRA87 

Vascular  Disease 

WRA88 

Vertebral  Fractures,  Spinal  Cord  Diseases/Injury 

WRA89 

Very  Severe  Neoplasm  /  Cancer 

WRA90 

*Denotes  medical  conditions  that  were  not  included  in  this  study. 
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APPENDIX  D.  CASE  MANAGEMENT  (CM)  ACUITY  LEVELS 


The  information  in  the  following  table  defines  the  CM  acuity  level  (DHA,  2016). 


1 

Low 

1-150  minutes  per  month 
or  (0-2.5  hrs  per  month 

Routine  discharge  planning, 
minimal  intervention(s). 

2 

Low  to 

moderate 

151-360  minutes  per 
month  or  (2.75  -  6.00 
hours  per  month 

Stable  with  ongoing  needs, 
chronic  care  intervention, 

infrequent  ER/inpatient  utilization 

3 

Moderate 

361-555  minutes  per 
month  or  (6.25  -  9.25 
hours  per  month) 

Stable  with  more  complicated 
ongoing  needs,  frequent 

ER/inpatient  utilization 

4 

Moderate  to 
Intense 

556-750  minutes  per 
month  or  (0.5  -  12.5  hrs 
per  month) 

Multiple  acute  needs 

5 

Intense 

751  minutes  and  above 
per  month  (12.75+  hrs  per 
month) 

Intensive  assessment  and/or 

monitoring  required 

57 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


58 


LIST  OF  REFERENCES 


Breiman,  L.  (2001).  Random  forests.  Machine  Learning ,  45,  5-32. 

Broad,  J.  M.  (2016,  February).  Active  duty  high  users  of  chronic  pain  medication. 
Portsmouth,  VA:  Navy  and  Marine  Corps  Public  Health  Center. 

Centers  for  Disease  Control  and  Prevention  (CDC).  (2015).  Increases  in  drug  and  opioid 
overdose  deaths  —  United  States,  2000-2014.  Retrieved  from 
http  ://w  w  w .  cdc .  go  v/drugo  verdo  se/data/index .  html 

Childress,  S.  (2016,  March  28).  Veterans  face  greater  risks  amid  opioid  crisis.  Retrieved 
from  http://www.pbs.org/wgbh/frontline/article/veterans-face-greater-risks-amid- 
opioid-crisis/ 

Chou,  R.,  Dowell,  D.,  &  Haegerich,  T.  M.  (2016).  CDC  guideline  for  prescribing  opioids 
for  chronic  pain — United  States,  2016  (MMWR  Recomm.  Rep.  65).  Atlanta,  GA: 
CDC.  Retrieved  from  http://www.cdc.gov/media/modules/dpk/2016/dpk- 
pod/rr6501eler-ebook.pdf 

Defense  Health  Agency  (DHA).  (2016).  MDR  data  dictionary.  Falls  Church,  VA:  Author. 

Ellis,  J.  W.  (2015,  September  24).  Non-cancer  chronic  pain  analysis:  Status  update  for 
active  duty.  Portsmouth,  VA:  Navy  and  Marine  Corps  Public  Health  Center. 

Faraway,  J.  J.  (2015).  Linear  models  with  R  (2nd  ed.).  Boca  Raton,  FL:  CRC  Press, 
Taylor  &  Francis  Group. 

Friedman,  J.,  Hastie,  T.,  Tibshirani,  R.  (2010).  Regularization  paths  for  generalized  linear 
models  via  coordinate  descent.  Journal  of  Statistical  Software,  33(1),  1-22. 
Retrieved  from  http://www.jstatsoft.org/v33/i01/ 

Goeman,  J.  J.  (2010).  LI  penalized  estimation  in  the  Cox  proportional  hazards  model. 
Biometrical  Journal,  52(1),  70-84. 

Goff,  G.,  &  Sayers,  K.  (Comps.).  (2015).  Navy  Medicine  strategic  plan  FY15.  Falls 
Church,  VA:  Bureau  of  Medicine  and  Surgery. 

Hastie,  T.,  Tibshirani,  R.,  Friedman,  J.  (2009).  The  elements  of  statistical  learning:  data 
mining,  inference,  and  prediction,  2nded.  (New  York,  NY:  Springer). 

James,  G.,  Witten,  D.,  Hastie,  T.,  Tibshirani,  R.  (2013).  An  introduction  to  statistical 
learning  with  applications  in  R.  (New  York,  NY:  Springer. 

Kenny,  T.  (2013,  October,  12).  Acute  stress  reaction.  Retrieved  from 
http://patient.info/health/acute-stress-reaction-leaflet 

59 


Kuhn,  Max.  (2016).  Caret:  Classification  and  regression  training.  Contributions  from 

Wing,  J.,  Weston,  S.,  Williams,  A.,  Keefer,  C.,  Engelhardt,  A.,  Cooper,  T,  Mayer, 
Z.,  Kenkel,  B.,  the  R  Core  Team,  Benesty,  M.,  Lescarbeau,  R.,  Ziem,  A., 

Scrucca,  L.,  Tang,  Y.,  and  Candan,  C.  R  package  version  6.0-71.  Retrieved  from 
http  s :  //CRAN.  R-proj  ect  .org/package=c  aret 

Levy,  R.  A.,  Netzer,  P.,  &  Pikulin,  L.  (2014,  September).  Opioid  use  and  low  back  pain 
among  the  Navy  beneficiary  population.  Arlington,  V A:  Center  for  Naval 
Analyses. 

Liaw,  A.  and  Wiener,  M.  (2002).  Classification  and  regression  by  randomForest.  R  News, 
2(3),  18-22. 

McDonald,  J.  H.  (2009).  Handbook  of  biological  statistics  (2nd  ed.).  Baltimore,  MD: 
Sparky  House  Publishing.  Retrieved  from 
http://udel.edu/~mcdonald/statfishers.html 

Mehmud,  S.  M.  (2012,  January).  Wakely  risk  assessment  model  (Version  102). 

Retrieved  from 

https://predictivemodeler.com/sitecontent/book/Ch06_Applications/Actuarial/WR 

A_Model/Versions/vl.02/Wakely%20Risk%20Assessment%20Model%20- 

%20White%20Paper.pdf 

National  Institute  on  Drug  Abuse  (NIDA).  (2014).  Opioids  [fact  sheet].  Retrieved  from 
https://www.drugabuse.gov/publications/research-reports/prescription- 
drug  s/opioids 

PTSD:  National  Center  for  PTSD.  (2015,  August  13).  Retrieved  from 

http://www.ptsd.va.gov/public/problems/pain-ptsd-guide-patients.asp 

R  Core  Team.  (2016).  R:  A  language  and  environment  for  statistical  computing.  Vienna, 
Austria:  R  Foundation  for  Statistical  Computing.  Retrieved  from  https://www.R- 
project.org/ 

Ridgeway,  G.,  with  contributions  from  others.  (2015).  gbm:  Generalized  Boosted 

Regression  Models.  R  package  version  2.1.1.  Retrieved  from  https://CRAN.R- 
proj  ect .  or  g/package=gbm 

Rudd,  R.  A.,  Aleshire,  N.,  Zibbell,  J.  E.,  &  Gladden,  R.  M.  (2016,  January  1).  Increases 
in  drug  and  opioid  overdose  deaths — United  States,  2000-2014.  Morbidity  and 
Mortality  Weekly  Report,  64(50):  1378-82.  Retrieved  from 
http  ://w  w  w .  ede .  go  v/mmwr/previe  w/mmwrhtml/ mm645 0a3  .htm 

Seal,  K.  (2014,  January  7).  Opioids  in  chronic  pain  and  PTSD:  Liability  or  potential 
therapy?  Speech  presented  at  Spotlight  on  Pain  Management.  Retrieved  from 
http://www.hsrd.research.va.gov/for_researchers/cyber_seminars/archives/video_ 
archive.  cfm?SessionID=791 


60 


Tibshirani,  R.  (1996).  Regression  shrinkage  and  selection  via  the  lasso.  Journal  of  the 
Royal  Statistical  Society,  Series  B  ( Methodological ),  58(  1 ),  267-288.  Retrieved 
from 

http://www.stat.washington.edu/courses/stat527/sl3/readings/j_royal_stat_soc_bl 

996.pdf 

Witten,  I.  H.,  Frank,  E.,  &  Hall,  M.  A.  (2011).  Data  mining:  Practical  machine  learning 
tools  and  techniques  (3rd  ed.).  Burlington,  MA:  Morgan  Kaufmann. 


61 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


62 


INITIAL  DISTRIBUTION  LIST 


1.  Defense  Technical  Information  Center 
Ft.  Belvoir,  Virginia 

2.  Dudley  Knox  Library 
Naval  Postgraduate  School 
Monterey,  California 


63 


